Methods of Generating Libraries of Nucleic Acid Sequences for Detection via Fluorescent in Situ Sequ

ABSTRACT

The present disclosure provides a number of targeted nucleic acid FISSEQ library construction methods. Targeted FISSEQ can exhibit several benefits, such as enhanced sensitivity and/or shorter assay time in the detection, identification, quantification, and/or determining the nucleotide sequence of the target species, relative to “random” or “whole-omic” detection via FISSEQ.

RELATED APPLICATION DATA

This application is a continuation application, which claims priority toPCT Application No. PCT/US17/49633 designating the United States andfiled Aug. 31, 2017; which claims the benefit of U.S. ProvisionalApplication No. 62/381,980 and filed Aug. 31, 2016 each of which arehereby incorporated by reference in their entireties.

STATEMENT AS TO FEDERALLY SPONSORED RESEARCH

This invention was made with government support under Grant Nos.P50HG005550 and RM1 HG008525 awarded by National Institutes of Healthand Grant No. DGE1144152 awarded by National Science Foundation. Thegovernment has certain rights in the invention.

BACKGROUND OF THE INVENTION

Randomly capturing RNA sequences for in situ sequencing enables de novomeasurement of both sequence variation and the spatial organization ofgene expression. Yet it is recognized that for many applications,sensitive detection of a targeted subset of RNA species is incrediblyvaluable. For example, it is desirable to accurately detect theexpressions of certain genes that are known to be clinically relevant todiagnosis, prognosis, and therapeutic guidance for human diseases. Inthe same way, randomly capturing DNA sequences for in situ sequencingenables de novo measurement of both sequence variation and the spatialorganization of genomes and DNA molecules. Yet it is recognized that formany applications, sensitive detection of a targeted subset of DNA locior sites of variation is incredibly valuable. For example, it isdesirable to accurately detect the presence of certain genomic mutationsor genotypes that are known to be clinically relevant to diagnosis,prognosis, and therapeutic guidance for human diseases. There remains aneed for the development of methods that allows accurate and efficientdetection of nucleic acid (i.e. DNA and RNA) via fluorescent in situsequencing (FISSEQ).

SUMMARY OF THE INVENTION

In various instances, the present disclosure provides compositions andmethods for preparing a library of sequences for florescent in situsequencing (FISSEQ). In one aspect, the present disclosure provides amethod for enhancing a hybridization reaction in a cell or cellularmatrix. The method comprises: (a) providing said cell or cellular matrixand a reaction mixture, comprising (i) a target nucleic acid molecule,(ii) a probe having sequence complementarity with a target sequence ofsaid target nucleic acid molecule, and (iii) a hybridization reactionenhancing agent comprising a polymer backbone, wherein saidhybridization reaction enhancing agent enhances a rate of ahybridization reaction between said target nucleic acid molecule andsaid probe having sequence complementarity with said target sequence ofsaid target molecule, and wherein said hybridization enhancing agentcomprises a functional group that facilitates inactivation of saidhybridization reaction enhancing agent; and (b) subjecting said reactionmixture to conditions sufficient to conduct said hybridization reactionbetween said target nucleic acid molecule and said probe having sequencecomplementarity with said target sequence of said target nucleic acidmolecule, wherein during said hybridization reaction, said hybridizationreaction enhancing agent enhances said rate of said hybridizationreaction between said target nucleic acid molecule and said probe havingsequence complementarity with said target sequence of said targetmolecule, as compared to another hybridization reaction conductedbetween said target nucleic acid molecule and said probe in the absenceof said hybridization reaction enhancing agent.

In some embodiments, the present disclosure provides a method furthercomprising, subsequent to (b), subjecting said functional group toconditions sufficient to inactivate said hybridization reactionenhancing agent. In some embodiments, the present disclosure providesthe method further comprising inactivating the hybridization reactionenhancing agent.

In some embodiments, the present disclosure provides the method furthercomprising initiating an enzymatic reaction, wherein the enzymaticreaction comprises reverse transcription, ligation, DNA polymerization.In some embodiments, said functional group is a hydrating group. In someembodiments, the present disclosure provides that said hydrating groupis an ionic, electrolytic, or hydrophilic group.

In some embodiments, the present disclosure provides that thehybridization reaction enhancing agent comprises a cleavable linkerbetween the polymer backbone and the hydrating group. In someembodiments, the present disclosure provides that the cleavable linkercomprises alpha-hydroxy acids, beta-keto acids, disulfide linkages, orother type of chemical linkages. In some embodiments, the functionalgroup is cleavable. In some embodiments, the method further comprisestriggering cleavage of the functional group. In some embodiments, themethod further comprises washing away the functional group. In someembodiments, the method further comprises initiating an enzymaticreaction. In some embodiments, the hybridization enhancing agent isfurther configured to enhance said enzymatic reaction. In someembodiments, the present disclosure provides that the enzymatic reactioncomprises reverse transcription, ligation, DNA polymerization.

In some embodiments, the present disclosure provides said functionalgroups is configured to be selectively inactivated by rendering an ionicgroup to have a neutral charge, or by rendering the hydrating group tobe weakly hydrating.

In some embodiments, the present disclosure provides method forenhancing a hybridization reaction in a cell or cellular matrix, whereinsaid cell or cellular matrix is integrated with a hydrogel. In somecases, said reaction mixture further comprises a buffer. In someembodiments, said buffer comprises a salt. In some embodiments, saidbuffer comprises blocking agents configured to reduce non-specificbinding of probes to off-target sequences. In some embodiments, saidbuffer comprises agents configured to alter an annealing property ofDNA.

In some embodiments, said polymer backbone is an ionic polymer backbone.

In some embodiments, the hybridization enhancing agent comprises apolyionic, polyelectrolyte, hydrophilic, or hydrating polymer.

In another aspect, the present disclosure provides a probe set for insitu nucleic acid sequence detection or identification of one or moretarget nucleic acid molecules of a cell. The probe set may comprise aplurality of probes comprising a plurality of target-specific sequences,a plurality of adaptor sequences and a plurality of barcode sequences,wherein a given probe of said plurality of probes comprises: (i) asequence of said plurality of target-specific sequences that iscomplementary to a target sequence of a target nucleic acid molecule ofsaid one or more target nucleic acid molecules of said cell; (ii) anadaptor sequence of said plurality of adaptor sequences coupled to saidsequence, wherein said adaptor sequence comprises a binding site for aprimer for an amplification reaction; and (iii) a barcode sequence ofsaid plurality of barcode sequences coupled to said adaptor sequence,wherein said barcode sequence is configured to allow detection oridentification of said target sequence or said at least said portion ofsaid target nucleic acid molecule, and wherein said plurality of barcodesequences are different across said plurality of probes.

In some embodiments, the barcode sequence comprises a gene barcodecorresponding to a particular gene, and wherein the gene barcode isconfigured to allow detection of the particular gene. In someembodiments, the barcode sequence further comprises a sequence barcodecorresponding to the sequence complementary to the target region, andwherein the sequence barcode is configured to allow detection of thesequence. In some embodiments, the gene barcode is defined by a firstset of sequences of the barcode sequences, and wherein the sequencebarcode is defined by the remaining set of sequences of the barcodesequences. In some embodiments, said plurality of barcode sequencespermit identification of different target sequences of different targetnucleic acid molecules. In some embodiments, said plurality of adaptorsequences are the same across said plurality of probes. In someembodiments, said adaptor sequence is complementary to a primer forconducting said amplification reaction. In some embodiments, saidamplification reaction is a rolling circle amplification (RCA) reaction.In some embodiments, a given barcode sequence of said plurality ofbarcode sequences permits identification of a given sequence of saidtarget region. In some embodiments, the adaptor sequence is locatedbetween the sequence complementary to the nucleic acid molecule and thebarcode. In some embodiments, the barcode sequence is located betweenthe sequence of said plurality of target-specific sequences and theadaptor sequence. In some embodiments, said target nucleic acid moleculeis ribonucleic acid (RNA), and wherein the sequence of said plurality ofsequences is configured to prime reverse transcription. In someembodiments, the sequence of said plurality of target-specific sequencesis located at a 3′ end of each probe.

In some embodiments, the sequence of said plurality of target-specificsequences, the adaptor sequence, and the barcode sequence are arrangedcontiguously from the 3′ end to the 5′ end of said given probe. In someembodiments, a 5′ end of said given probe is phosphorylated.

The present disclosure also provides a method of generating libraries ofprobes for detecting nucleic acid in situ with said given probe,comprising: hybridizing said given probe to a nucleic acid sequence toproduce a hybridized product, and circularizing the hybridized product,and generating said libraries of probes via an amplification reaction.In some embodiments, circularizing the hybridized product comprisescircularizing by a ligase when the probe is annealed to the nucleic acidsequence. In some embodiments, circularizing the hybridized productcomprises circularizing by a ligase using an additional splintoligonucleotide independent of the nucleic acid sequence.

In some embodiments, circularizing the hybridized product comprisesfilling in a gap in the probe with aid of a reverse transcriptase, DNApolymerase, or ligase.

In some embodiments, the nucleic acid sequence is a ribonucleic acid(RNA) or complementary deoxyribonucleic acid (cDNA) sequence. In someembodiments, the nucleic acid sequence is a deoxyribonucleic acid (DNA)sequence.

In some embodiments, the plurality of probes are linear probes. In someembodiments, the plurality of probes are circular probes. In someembodiments, the plurality of probes comprise molecular inversionprobes. In some embodiments, the plurality of probes comprise padlockprobes. In some embodiments, said given probe of the plurality furthercomprises processing sites. In some embodiments, the processing sitescomprise additional amplification regions. In some embodiments, theadditional amplification regions comprise polymerase chain reaction(PCR) primer sequences. In some embodiments, the processing sitescomprise additional cutting sites.

The present disclosure also provides a method of maturing the pluralityof probes, comprising: cutting away additional amplification regions viathe additional cutting sites.

In some embodiments, said given probe of the plurality comprises asufficient length so as be circularized. In some embodiments, thesufficient length is equal to or more than 35 nucleotides.

The present disclosure also provides a method of depleting targetsequences with said given probe. The method may comprise: hybridizingthe probe to a nucleic acid sequence, and depleting said sequence. Insome embodiments, said depleting is mediated by a RNase H digestion.

In some embodiments, said depleting is mediated by a Cas9 or otherprotein-nucleic acid complexes.

In another aspect, the present disclosure provides a method for in situnucleic acid sequence detection or identification of one or more targetnucleic acid molecules of a cell. The method comprises: (a) providing areaction mixture comprising said one or more target nucleic acidmolecules and a plurality of probes, wherein said plurality of probescomprises a plurality of target-specific sequences, a plurality ofadaptor sequences and a plurality of barcode sequences, wherein a givenprobe of said plurality of probes comprises: (i) a sequence of saidplurality of target-specific sequences that is complementary to a targetsequence of a target nucleic acid molecule of said one or more targetnucleic acid molecules; (ii) an adaptor sequence of said plurality ofadaptor sequences coupled to said sequence, wherein said adaptorsequence is for conducting an amplification reaction on said given probewhen said sequence is hybridized to said target sequence; and (iii) abarcode sequence of said plurality of barcode sequences coupled to saidadaptor sequence, wherein said barcode sequence is configured to allowdetection or identification of said target sequence or said at leastsaid portion of said target nucleic acid molecule, and wherein saidplurality of barcode sequences are difference across said plurality ofprobes; (b) subjecting said reaction mixture to conditions sufficient topermit said sequence to hybridize to said target sequence; and (c) usingsaid barcode to detect or identify said target sequence or said at leastsaid portion of said target nucleic acid molecule.

In some embodiments, the method further comprises, prior to (c),conducting said amplification reaction on said given probe when saidsequence is hybridized to said target sequence.

In some embodiments, said target nucleic acid molecule is a ribonucleicacid molecule, and wherein (b) further comprises subjecting saidsequence to conditions sufficient to perform reverse transcriptionamplification on said sequence to yield a complementary deoxyribonucleicacid molecule as an amplification product of said given probe.

In some embodiments, the barcode sequence comprises a gene barcodecorresponding to a particular gene, and wherein the gene barcode isconfigured to allow detection of the particular gene.

In some embodiments, the barcode sequence further comprises a sequencebarcode corresponding to the sequence complementary to the targetregion, and wherein the sequence barcode is configured to allowdetection of the sequence.

In some embodiments, the gene barcode is defined by a first set ofsequences of the barcode sequences, and wherein the sequence barcode isdefined by the remaining set of sequences of the barcode sequences.

In another aspect, the present disclosure provides a method ofgenerating a library of nucleic acid sequences for target nucleic acidsequence detection. The method may comprise: identifying a set of targetnucleic acid sequences; designing a plurality of linear probes targetingthe set of target nucleic acid sequences; hybridizing the plurality oflinear probes to the target nucleic acid sequence; circularizing theplurality of linear probes; and detecting the target nucleic acidsequences by fluorescent in situ sequencing (FISSEQ).

In some embodiments, the probes are probe complexes comprising nucleicacid, DNA transposase, or Cas9. In some embodiments, the plurality oflinear probes are circularized by an enzyme. In some embodiments, theenzyme comprises a ligase. In some embodiments, the enzyme furthercomprises a reverse transcriptase, a polymerase, or both. In someembodiments, the method further comprises amplifying the plurality ofcircularized probes.

In some embodiments the plurality of circularized probes are amplifiedby rolling circle amplification. In some embodiments, each of theplurality of probes comprises an adaptor sequence. In some embodiments,each of the plurality of probes further comprises a barcode sequence. Insome embodiments, the plurality of probes are synthesized by DNAmicroarray. In some embodiments, the plurality of probes are hybridizedto the target nucleic acid sequences in the presence of a crowdingagent. In some embodiments, the sequencing is sequencing by synthesis(SBS), sequencing by ligation (SBL), or sequencing by hybridization(SBH). In some embodiments, the target nucleic acids compriseribonucleic acids or deoxyribonucleic acids. In some embodiments, thedeoxynucleic acids are double-stranded deoxynucleic acids. In someembodiments, the double-stranded deoxynucleic acids are converted intosingle-stranded deoxynucleic acids by thermal melting or enzymaticdigestion. In some embodiments, the plurality of probes comprisesnucleic acid analogs. In some embodiments, the nucleic acid analogscomprise locked-nucleic acid (LNA).

In another aspect, the present disclosure provides a method of scoringcandidate nucleic acid sequences for targeted nucleic acid sequencedetection. The method comprises: identifying, with aid of a processor,target nucleic acid sequences for detection; generating, with aid of theprocessor, a probe sequence database, wherein the probe sequencedatabase comprises a plurality of different subsequences of the targetnucleic acid sequences; scoring, with aid of the processor, theplurality of different subsequences of the probe sequence database basedon a predetermined criteria for use in fluorescent in situ sequencing(FISSEQ).

In some embodiments, each of the plurality of candidate probe sequencescomprise a predetermined length. In some embodiments, the predeterminedlength is 15 nucleotides or less. In some embodiments, said scoring isbased on existence of a G quadruplex. In some embodiments, said scoringis based on guanine-cytosine content. In some embodiments, said scoringis based on a melting temperature of the plurality of differentsubsequences. In some embodiments, said scoring is based on an exonpileup. In some embodiments, said scoring is based on likelihood ofprobe heterodimerization. In some embodiments, said scoring is based onexistence of common k-mers. In some embodiments, said scoring is basedon a thermodynamic approach.

In some embodiments, the thermodynamic approach comprises using a Blastalgorithm with a short word size to find similarities, generating asimilarity matrix, and using the similarity matrix to computethermodynamic values of the plurality of different subsequences, andeliminating subsequences based on a thermodynamic threshold forcross-hybridization. In some embodiments, said scoring is based onenzyme mismatch sensitivity profiles. In some embodiments, said scoringis based on sequence dependency of downstream fluorescent in situsequencing (FISSEQ) steps. In some embodiments, the sequence dependencyis measured using probe-level barcodes. In some embodiments, saidscoring is based on five or more different criteria. In someembodiments, the method further comprises excluding, from the probesequence database, a subset of the plurality of different subsequencesof the target nucleic acid sequence. In some embodiments, the subset ofthe plurality of different subsequences comprises subsequences notmeeting the predetermined criteria. In some embodiments, the subset ofthe plurality of different subsequences comprises subsequencescomprising a G quadruplex. In some embodiments, the subset of theplurality of different subsequences comprises subsequences likely toundergo heterodimer formation. In some embodiments, the method furthercomprises selecting, from the probe sequence database, a subset of theplurality of different subsequences of the target nucleic acid sequencefor synthesizing libraries of nucleic acid sequences for said targetednucleic acid sequence detection.

In some embodiments, the method further comprises synthesizing librariesof nucleic acid sequences based on said scoring. In some embodiments,said libraries of nucleic acid sequences comprise a subset of theplurality of different subsequences selected based on said predeterminedcriteria. In some embodiments, the method further comprisesincorporating an adaptor sequence with each of the plurality ofdifferent subsequences. In some embodiments, the adaptor sequencecomprises a T2S sequencing primer. In some embodiments, the methodfurther comprises incorporating a barcode to each of the plurality ofdifferent subsequences. In some embodiments, said assigning comprises:providing a pool of barcodes, and dentifying a set of g barcodes withHamming distance h. In some embodiments, the pool of barcodes arederived from a pool of k-mers. In some embodiments, the pool of barcodesexclude homopolymers and GV runs >=4, or G quadruplexes. In someembodiments, said identifying the set of g barcodes with Hammingdistance H is accomplished using a graph based algorithm.

In some embodiments, the method provided herein further comprisesamplifying the libraries of nucleic acid sequences. In some embodiments,a fraction of the plurality of different subsequences is amplified. Insome embodiments, the method further comprises purifying the librariesof nucleic acid sequences. In some embodiments, the method furthercomprises utilizing the libraries of nucleic acid sequences. In someembodiments, said utilizing the plurality of different subsequencescomprises utilizing the plurality of different subsequences influorescent in situ sequencing (FISSEQ).

In another aspect, the present disclosure provides a method ofgenerating libraries of nucleic acid sequences for targeted nucleic acidsequence detection, the method comprising: identifying a set of targetnucleic acid sequences for detection; generating reference sequencedatabases comprising sequence portions from the set of target nucleicacid sequences; selecting candidate sequence portions from the referencesequence databases; designing, computationally, the libraries of nucleicacid sequences, wherein said designing comprises scoring said candidatesequence portions according to a predetermined criteria; synthesizingthe libraries of nucleic acid sequences; amplifying the libraries ofnucleic acid sequences; purifying the libraries of nucleic acidsequences; and validating the libraries of nucleic acid sequences fortargeted nucleic acid sequence detection, wherein the targeted nucleicacid sequence detection is via fluorescent in situ sequencing (FISSEQ).In some embodiments, the libraries of nucleic acid sequences comprise asequence portion that is complementary to the candidate sequence portionof the target nucleic acid sequences.

In some embodiments, the libraries of nucleic acid sequences furthercomprise adaptor sequence. In some embodiments, the libraries of nucleicacid sequences further comprise a barcode sequence. In some embodiments,the libraries of nucleic acid sequences are complexed with one or moreproteins. In some embodiments, the libraries of nucleic acid sequencesare synthesized on a DNA microarray. In some embodiments, the librariesof nucleic acid sequences comprise guide RNAs that are complexed CRISPRenzymes for FISSEQ detection of the guide RNAs, and wherein each of theguide RNAs comprises an adaptor sequence. In some embodiments, themethod comprises validating the nucleic acid sequences, whereinvalidating the nucleic acid sequences is by sequencing by synthesis(SBS), sequencing by ligation (SBL), or sequencing by hybridization(SBH). In some embodiments, targeted nucleic acid sequence detection isby sequencing by synthesis (SBS), sequencing by ligation (SBL), orsequencing by hybridization (SBH).

In some embodiments, the libraries of nucleic acid sequences arehybridized to the set of target nucleic acid sequences in situ fordetection. In some embodiments, a crowding agent is included forenzyme-compatible enhancement of hybridization between the libraries ofnucleic acid sequences and the set of target nucleic acid sequences. Insome embodiments, the method further comprises circularizing the libraryof nucleic acid sequences hybridized to the target nucleic acidsequences. In some embodiments, the library of nucleic acid sequences iscircularized by an enzyme. In some embodiments, the enzyme comprises aligase. In some embodiments, the enzyme further comprises a reversetranscriptase, a polymerase, or both.

In some embodiments, the library of nucleic acid sequences iscircularized when hybridized to a splint oligonucleotide. In someembodiments, the circularized library of nucleic acid sequences areamplified by rolling circle amplification. In some embodiments, thetarget nucleic acids comprise ribonucleic acids or deoxyribonucleicacids. In some embodiments, the deoxynucleic acids are double-strandeddeoxynucleic acids. In some embodiments, the double-strandeddeoxynucleic acids are converted into single-stranded deoxynucleic acidsby thermal melting or enzymatic digestion. In some embodiments, thelibrary of nucleic acid sequences comprises nucleic acid analogs. Insome embodiments, the nucleic acid analogs comprise locked-nucleic acid(LNA).

INCORPORATION BY REFERENCE

All publications, patents, and patent applications mentioned in thisspecification are herein incorporated by reference to the same extent asif each individual publication, patent, or patent application wasspecifically and individually indicated to be incorporated by reference.

BRIEF DESCRIPTION OF THE DRAWINGS

The novel features of the invention are set forth with particularity inthe appended claims. A better understanding of the features andadvantages of the present invention will be obtained by reference to thefollowing detailed description that sets forth illustrative embodiments,in which the principles of the invention are utilized, and theaccompanying drawings of which:

FIG. 1A illustrates a mature primer comprising: (a) sequencecomplementary to the RNA molecule at the 3′ end, which anneals to theRNA molecule in situ and primes RT; (b) a common adaptor sequence, fromwhich RCA and sequencing reactions are primed; (c) a gene-level barcodeat the 5′ end; and 5′ phosphorylation, in accordance with embodiments.

FIG. 1B illustrates that the complementary region of the primer annealsto the target RNA species and primes an RT reaction, incorporatingRNA-templated bases into the cDNA, in accordance with embodiments.

FIG. 1C illustrates that in the linear RCA amplicon, each of the ntandem repeats contains the barcode as well as adjacent RNA-templatedsequence, enabling quantification of capture specificity, in accordancewith embodiments.

FIG. 2A illustrates an exemplary results of validation ofmicroarray-synthesized FISSEQ primer library, in accordance withembodiments.

FIG. 2B illustrates that a significant fraction of the pool contains apayload that matches the expected size, which is indicated by the dottedline, in accordance with embodiments.

FIG. 2C illustrates a violin plot of the distribution of payloads ofFIG. 2B, showing that most members of the library are present at anaverage of a few copies, in accordance with embodiments.

FIGS. 3A-F depict exemplary targeted FISSEQ capture schemes, inaccordance with embodiments.

FIGS. 4A-E depict an exemplary probe design and maturation strategy formanufacture of padlock or gap-fill probes by oligonucleotide librarysynthesis, such as by DNA microarray, in accordance with embodiments.

FIG. 5 illustrates an exemplary image of targeted FISSEQ of human breastcancer tissue biopsy sample, in accordance with embodiments.

FIG. 6 illustrates exemplary experimental data summary of sequencingfrom FIG. 5, in accordance with embodiments.

FIG. 7 shows a computer control system that is programmed or otherwiseconfigured to implement methods provided herein.

DETAILED DESCRIPTION OF THE INVENTION Targeted FISSEQ

Fluorescent in situ sequencing (FISSEQ) can refer to a method to detector sequence 3-dimensionally arranged targets in situ within a matrix,wherein the detection signal is a fluorescent signal. Sequencing methodsthat can be employed by FISSEQ can be sequencing-by-synthesis,sequencing by ligation, or sequencing by hybridization. The targetsdetected or sequenced in FISSEQ can be a biomolecule of interest or aprobe bound to the biomolecule of interest.

Targeted FISSEQ may have the potential for greater per-moleculesensitivity, as cellular volume that would otherwise be occupied by RCAamplicons containing cDNA or DNA sequence irrelevant to a biologicalphenomenon can be reallocated to the subset of RNA or DNA species ofinterest. Moreover, for RNA capture, random hexamer priming of reversetranscription may not be particularly efficient. See e.g., Stahlberg,Anders, et al. “Properties of the reverse transcription reaction in mRNAquantification.” Clinical chemistry 50.3 (2004): 509-515. In someinstances, targeted FISSEQ may have a sensitivity equal to or greaterthan about 5 times, 10 times, 20 times, 40 times, 80 times, 120 times,160 times, 200 times, 400 times, 1000 times, 5000 times, or more of thatof that of random FISSEQ. For example, assuming that a RCA ampliconoccupies 0.04 um{circumflex over ( )}3 intracellular volume,approximately corresponding to the volume of a diffraction-limited voxelunder standard microscopy conditions, human cells range in averageapproximate a volume from 100 um{circumflex over ( )}3 (erithrocytes) to4,000,000 um{circumflex over ( )}3 (oocytes), corresponding to a maximumof 2,500 to 100,000,000 diffraction-limited voxels per human cell,depending on type. At the lower end, many human cells, such aserythrocytes, neutrophils, beta cells, enterocytes, fibroblasts, and soon, can contain fewer than approximately 100,000 diffraction-limitedvoxels. However, a typical mammalian cell may contain as many as 500,000mRNA molecules, corresponding to tens of thousands of genetic species.GAPDH, a house-keeping gene, is commonly assumed to be expressed at acopy number of 500 molecules per cell. Given an erythrocyte with volumeof 100 um{circumflex over ( )}3 and containing 500,000 mRNA molecules, aperfect space-filling FISSEQ method could detect as many as 0.5% of mRNAmolecules, or approximately on average 2.5 of the 500 GAPDH molecules,if sampled randomly. However, a targeted GAPDH FISSEQ assay withoptimized efficiency of 80% per-molecule capture, comparable toestimates of single molecule FISH sensitivity, would detect on averageapproximately 400 of the 500 GAPDH molecules, representing a 160-foldimprovement in sensitivity. For lower-expressed genes, such astranscription factors, which may be expressed at only a few copies ofRNA per cell (say, 5 molecules per cell), untargeted FISSEQ may detecton average zero molecules per cell, whereas an 80% per-moleculeefficient targeted FISSEQ assay would detect 4 of 5 transcription factorRNA molecules, representing a nearly infinite improvement insensitivity. Moreover, for RNA capture, random hexamer priming ofreverse transcription may not be efficient. Other sequence capturemethodologies and probe designs may have better capture efficiencies.For DNA capture, targeted capture may also benefit from enhancedper-molecule sensitivity.

Targeted FISSEQ can also be a substantially faster assay than wholetranscriptome RNA FISSEQ or whole genome DNA FISSEQ. In some instances,targeted FISSEQ may be about 2 times, 3 times, 4 times, 6 times, 8times, 10 times, 12 times, 14 times, 16 times, 18 times, or 20 timesfaster than whole transcriptome RNA FISSEQ or whole genome DNA FISSEQ.As one example, whole-transcriptome FISSEQ may require a sequencing readlong enough for high-precision short read alignment. In other words, thesequencing read may need to be long enough to computationally determinethe originating molecular species, such as by alignment of thesequencing read to a genomic or transcriptomic reference sequencedatabase. In such whole-omic applications, RNA-seq reads may need to beapproximately 20-30 bases long, while genomic reads may need to belonger, such as 50-100 bases long, in order to recover substantiallyaccurate alignments. For targeted FISSEQ of barcode molecular labels,where the barcode labels may understood to be nucleic acid sequenceswith 4{circumflex over ( )}N complexity given a sequencing read of Nbases, a much shorter sequencing read may be required for molecularidentification. For example, 1024 molecular species may be identifiedusing a 5-nucleotide barcode sequence (4{circumflex over ( )}5=1024),whereas 8 nucleotide barcodes can be used to identify up to 65,536molecular species, a number greater than the total number of distinctgenes in the human genome. Therefore, a targeted FISSEQ assay designedto detect each gene in the human transcriptome may be nearly 4× faster(8 bases vs 30 bases), and in the human genome up to more than 12×faster (8 bases vs 100 bases). When targeting specific RNA species forreverse transcription, the space of potential cDNA sequences can be asignificant subset of the entire transcriptome, and therefore fewerbases of sequencing are required to identify the target molecule. Whentargeting specific DNA loci or nucleotides for sequencing orre-sequencing, the space of captured sequences can be a significantsubset of the entire genomic sequence or cellular DNA sequence. TargetedFISSEQ strategies where molecular “barcode” sequences contained in theprobes are detected, rather than endogenous sequences, can be anefficient read-out in terms of information per cycle of sequencing.Because the barcode sequences are pre-determined, they can also bedesigned to feature error detection and correction mechanisms.

The targeted FISSEQ can be applied to any sample from which spatialinformation is of interest. For example, the sample can be a biologicalsample, including a cell, a tissue, and a cellular matrix. Depending onthe application, the biological sample can also be whole blood, serum,plasma, mucosa, saliva, cheek swab, urine, stool, cells, tissue, bodilyfluid or a combination thereof.

The present disclosure provides for various targeted nucleic acid FISSEQlibrary construction methods. Therefore, even though the disclosure willnot explicitly enumerate all possible implementations, it should beunderstood that the general descriptions of these approaches can beextended or combined in a variety of ways. These strategies may vary inthe number and type of enzymatic reactions required to construct the insitu sequencing library, from the most elaborate (e.g. a targeted RNAFISSEQ method closely mirroring the random capture protocol, but using apool of specific RT primers), to the simplest methods requiring noenzymatic reactions, only nucleic acid hybridization.

The term “nucleic acid” as used herein may refer to a moleculecomprising one or more nucleic acid subunits, or nucleotides, and can beused interchangeably with “polynucleotide” or “oligonucleotide”. Anucleic acid may include one or more nucleotides selected from adenosine(A), cytosine (C), guanine (G), thymine (T) and uracil (U), or variantsthereof. A nucleotide may include a nucleoside and at least 1, 2, 3, 4,5, 6, 7, 8, 9, 10, or more phosphate (PO3) groups. A nucleotide caninclude a nucleobase, a five-carbon sugar (either ribose ordeoxyribose), and one or more phosphate groups. Ribonucleotides arenucleotides in which the sugar is ribose. Deoxyribonucleotides arenucleotides in which the sugar is deoxyribose. A nucleotide can be anucleoside monophosphate or a nucleoside polyphosphate. A nucleotide canbe a deoxyribonucleoside polyphosphate, such as, e.g., adeoxyribonucleoside triphosphate (dNTP), which can be selected fromdeoxyadenosine triphosphate (dATP), deoxycytidine triphosphate (dCTP),deoxyguanosine triphosphate (dGTP), uridine triphosphate (dUTP) anddeoxythymidine triphosphate (dTTP) dNTPs, that include detectable tags,such as luminescent tags or markers (e.g., fluorophores). A nucleotidecan include any subunit that can be incorporated into a growing nucleicacid strand. Such subunit can be an A, C, G, T, or U, or any othersubunit that is specific to one or more complementary A, C, G, T or U,or complementary to a purine (i.e., A or G, or variant thereof) or apyrimidine (i.e., C, T or U, or variant thereof). In some examples, anucleic acid is deoxyribonucleic acid (DNA), ribonucleic acid (RNA), orderivatives or variants thereof. A nucleic acid can be single-stranded,double-stranded, triple-stranded, helical, hairpin, etc. In some cases,a nucleic acid molecule is circular. A nucleic acid can have variouslengths. A nucleic acid molecule can have a length of at least about 10bases, 20 bases, 30 bases, 40 bases, 50 bases, 100 bases, 200 bases, 300bases, 400 bases, 500 bases, 1 kilobase (kb), 2 kb, 3, kb, 4 kb, 5 kb,10 kb, 50 kb, or more. A nucleic acid molecule can be isolated from acell or a tissue. As embodied herein, the nucleic acid sequences maycomprise isolated and purified DNA/RNA molecules, synthetic DNA/RNAmolecules, synthetic DNA/RNA analogs.

Nucleic acid analogs can include, but are not limited to, 2′-O-methylmodifications, 2′-O-methyl modified ribose sugars with terminalphosphorothioates and a cholesterol group at the 3′ end,2′-O-methoxyethyl (2′-MOE) modifications, 2′-fluoro modifications, and2′,4′ methylene modifications (LNAs). Further exemplary inhibitorynucleic acids include modified oligonucleotides (2′-O-methylated or2′-O-methoxyethyl), locked nucleic acids (LNA), morpholinooligonucleotides, peptide nucleic acids (PNAs), PNA-peptide conjugates,and LNA/2′-O-methylated oligonucleotide mixmers. For exemplarymodifications see, e.g., Valóczi et al., Nucleic Acids Res. 32(22):e175(2004) Fabiani and Gait, RNA 14:336-46 (2008); Lanford et al., Science327(5962:198-201 (2010); Elmen et al., Nature 452:896-9 (2008); Gebertet al., Nucleic Acids Res. 42(1):609-21 (2013); Kloosterman et al., PLoSBiol 5(8):e203 (2007); and Elmen et al., Nucleic Acids Res. 36:1153-1162(2008).

Additional examples of modified nucleotides include, but are not limitedto diaminopurine, 5-fluorouracil, 5-bromouracil, 5-chlorouracil,5-iodouracil, hypoxanthine, xantine, 4-acetylcytosine,5-(carboxyhydroxylmethyl)uracil,5-carboxymethylaminomethyl-2-thiouridine,5-carboxymethylaminomethyluracil, dihydrouracil,beta-D-galactosylqueosine, inosine, N6-isopentenyladenine,1-methylguanine, 1-methylinosine, 2,2-dimethylguanine, 2-methyladenine,2-methylguanine, 3-methylcytosine, 5-methylcytosine, N6-adenine,7-methylguanine, 5-methylaminomethyluracil,5-methoxyaminomethyl-2-thiouracil, beta-D-mannosylqueosine,5′-methoxycarboxymethyluracil, 5-methoxyuracil,2-methylthio-D46-isopentenyladenine, uracil-5-oxyacetic acid (v),wybutoxosine, pseudouracil, queosine, 2-thiocytosine,5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil,uracil-5-oxyacetic acid methylester, uracil-5-oxyacetic acid(v),5-methyl-2-thiouracil, 3-(3-amino-3-N-2-carboxypropyl) uracil, (acp3)w,2,6-diaminopurine and the like. In some cases, nucleotides may includemodifications in their phosphate moieties, including modifications to atriphosphate moiety. Non-limiting examples of such modifications includephosphate chains of greater length (e.g., a phosphate chain having, 4,5, 6, 7, 8, 9, 10 or more phosphate moieties) and modifications withthiol moieties (e.g., alpha-thiotriphosphate andbeta-thiotriphosphates). Nucleic acid molecules may also be modified atthe base moiety (e.g., at one or more atoms that typically are availableto form a hydrogen bond with a complementary nucleotide and/or at one ormore atoms that are not typically capable of forming a hydrogen bondwith a complementary nucleotide), sugar moiety or phosphate backbone.Nucleic acid molecules may also contain amine-modified groups, such asamino ally 1-dUTP (aa-dUTP) and aminohexhylacrylamide-dCTP (aha-dCTP) toallow covalent attachment of amine reactive moieties, such asN-hydroxysuccinimide esters (NHS). Alternatives to standard DNA basepairs or RNA base pairs in the oligonucleotides of the presentdisclosure can provide higher density in bits per cubic mm, highersafety (resistant to accidental or purposeful synthesis of naturaltoxins), easier discrimination in photo-programmed polymerases, or lowersecondary structure. Such alternative base pairs compatible with naturaland mutant polymerases for de novo and/or amplification synthesis aredescribed in Betz K, Malyshev D A, Lavergne T, Welte W, Diederichs K,Dwyer T J, Ordoukhanian P, Romesberg F E, Marx A. Nat. Chem. Biol. 2012July; 8(7):612-4, which is herein incorporated by reference for allpurposes.

Enzymatic reactions can be challenging to optimize in situ, and are alikely source of reduced capture efficiency per target molecule.However, enzymatic reactions can also increase the amount of informationcaptured in the FISSEQ library. For example, reverse transcription isnecessary for de novo detection of RNA sequences, including variationdue to RNA editing, alternative splicing, or gene fusions. Without usingan enzyme to capture RNA-templated sequence, or to catalyze a basemismatch-sensitive reaction, nucleic acid hybridization alone can bedependent upon to confer specificity on the probe-target interaction. Insome embodiments, the mismatch sensitivity of Cas9 or another nucleicacid-guided, nucleic acid-binding protein may be used to enhance thecapture specificity of a DNA sequence.

In some embodiments, targeted FISSEQ methods provided herein comprises apool, or library, of short oligonucleotide probes to specificallycapture certain nucleic acid molecules. The term “probe” used herein canrefer to an oligonucleotide that can bind to a biomolecule target in asample. The probe can directly or indirectly bind to the target. Theprobe can be in various lengths. To reduce the cost of synthesizing theprobe pools, in some embodiments, microarray DNA synthesis platforms canbe used to generate massively complex short (approximately 200nucleotide) oligonucleotide libraries. See e.g., Kosuri, Sriram, andGeorge M. Church. “Large-scale de novo DNA synthesis: technologies andapplications.” Nature methods 11.5 (2014): 499-507. Exemplary platformsmay include platforms provided by Agilent, CustomArray, and TwistBioscience. Microarray synthesis may refer to the synthesis of DNA ornucleic acid analog oligonucleotides attached to a solid substrate.Commercial supplier Twist Bioscience, for example, features microarrayscontaining 9,600 wells with 121 discrete oligonucleotide speciessynthesized per well, for a total of 1.16 million oligonucleotides perarray. Commercial supplier Agilent's OLS libraries contain just over244,000 oligonucleotide species, while the DNA microarray of commercialsupplier Custom Array synthesizes just over 94,000. These libraries ofoligonucleotides are typically liberated from the solid supportsubstrate into a solution of DNA species representing a renewable sourceof single-stranded DNA probes, generated using techniques to highlyamplify and process the library. See e.g., Beliveau, Brian J., NicholasApostolopoulos, and Chao-ting Wu. “Visualizing genomes with OligopaintFISH probes.” Current Protocols in Molecular Biology (2014): 14-23;Chen, Kok Hao, et al. “Spatially resolved, highly multiplexed RNAprofiling in single cells.” Science 348.6233 (2015): aaa6090).Alternatively, the oligonucleotides may be amplified directly from thesolid support, in whole or in specific subpools. See e.g., Kosuri,Sriram, et al. “Scalable gene synthesis by selective amplification ofDNA pools from high-fidelity microchips.” Nature biotechnology 28.12(2010): 1295-1299.

To enable the microarray synthesis strategy, additional sequences foramplification and subsequent processing and maturation of the probes canbe added to the probes. Software to facilitate computational design ofthe probe sequences, as well as validation by high-throughput sequencingof the final products are discussed herein. The detailed descriptions ofthese strategies are presented in the section on DNA Array Synthesis ofProbe Pools.

Accordingly, the present disclosure provides that targeted FISSEQ can beused to detect, identify, quantify, and/or determine the nucleotidesequences of a subset of the whole transcriptome or whole genome of abiological sample.

In various embodiments, the present disclosure provides methods oftargeting nucleic acid detection via FISSEQ to a subset of the wholetranscriptome or whole genome. In some embodiments, the presentdisclosure provides methods to select a subset of targets from the wholetranscriptome or whole genome. In some embodiments, the presentdisclosure provides methods to design probes for the target detection.In some embodiments, the present disclosure provides methods tosynthesize probes for the target detection. The disclosure provides thatcertain novel methods of targeting nucleic acid detection via FISSEQpresented here utilize DNA microarray synthesis of oligonucleotide“libraries”. In some embodiments, the present disclosure providesdifferent probe design strategies, resulting in probes having differentarchitectures. In some embodiments, the present disclosure providesmethods to mature probes, resulting in probes suitable for FISSEQ targetdetection. The term “maturation” and “maturing” used herein may refer toa process to make a probe or a probe library suitable to be used in theFISSEQ assay. For example, in some cases, a probe library synthesizedfrom the DNA microarray may need to be further amplified before using inFISSEQ. In these cases, an amplification primer may be incorporated ineach of the probe in the original probe library for amplification, butneeds to be cleaved after amplification to mature the probe library.

Targeted FISSEQ can exhibit several benefits, such as enhancedsensitivity and/or shorter assay time in the detection, identification,quantification, and/or determining the nucleotide sequence of the targetspecies, relative to “random” or “whole-omic” detection via FISSEQ.

Targeted RNA Capture Capture by Reverse Transcription

Specific reverse transcription (RT) primers rather than random primerscan be used for RNA FISSEQ and can exhibit several advantages asdiscussed above. However, challenges remain for targeting RNA speciesusing specific RT primers rather than random primers. For example, whilein prior studies, targeting of the mCherry mRNA, was possible, thistranscript was expressed at a much higher level than most endogenousgenes. See e.g., Lee, Je Hyuk, et al. “Highly multiplexed subcellularRNA sequencing in situ.” Science 343.6177 (2014): 1360-1363. Among theRT primers that were targeted to mCherry mRNA, it was observed that theefficiency of RNA capture fall dramatically as the target site movedaway from the 5′ end of the transcript. Since this experiment also useda targeted rolling circle amplification (RCA) primer complementary tocDNA sequence just after the annotated transcription start site, the RTprimer position-dependent variation in efficiency could be due to eitherinefficient circularization of the longer cDNA molecules or reversetranscription terminating prematurely, before reaching the RCA primingsequence. It remains as a challenge to rationally design targeted RTprimers against other genes to generate appreciable numbers of RCAamplicons. Given that the logic of RNA accessibility in situ was notfully understood—for example, which regions of RNA transcripts aregenerally bound by paused or dense polysome complex—nor the sequencedependency of reverse transcriptase, one approach to targeted RT forFISSEQ of the present disclosure is to massively tile target specieswith RT primers. In some instances, a target RNA or DNA species may bemassively tiled by targeted capture probes by designing a plurality ofprobes complementary in part, or substantially complimentary in part, tothe entirety of the nucleic acid species in aggregate. In suchinstances, a target RNA species k bases long may have k targetingprimers. For example, a pool of targeted reverse transcription primerswith 20-nucleotides of sequence complementarity, or substantial sequencecomplementarity, to the target species, may contain a primercomplementary or substantially complementary to bases 1-20, anothercomplementary or substantially complementary to bases 2-21, and so on,with the last primer complementary or substantially complementary to thebases (k-20) to k, where the target species is k bases long. In otherinstances, the pool of targeted primers may comprise 1-10, 10-100,100-1000, or 1 to k primers.

To facilitate fast detection, a barcoded FISSEQ strategy can be used,which reduces the number of sequencing cycles necessary to detect themolecular identity of each RCA amplicon (DNA nanoball). In someembodiments, a mature targeted RT primer can include the followingfeatures: a sequence complementary to the RNA molecule at the 3′ end,which anneals to the RNA molecule in situ and primes RT; a commonadaptor sequence, from which RCA and sequencing reactions are primed; agene-level barcode at the 5′ end; and 5′ phosphorylation (FIGS. 1A-1C).The term “gene-level barcode” used herein can refer to a barcodesequence specific to a particular gene. In some embodiments, a uniquebarcode may be used to for one or more genes. As used herein, the termsgene-level barcode and gene barcode may be used interchangeably.

The gene-level barcode can be in various lengths, for example, 1-3nucleotides in length, 3-5 nucleotides in length, 5-8 nucleotides inlength, 8-10 nucleotides in length, or 10-15 nucleotides in length. Insome embodiments, the gene-level barcode can be 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 nucleotides in length. Insome embodiments, the gene-level barcode may be more than 20 nucleotidesin length. For example, the gene-level barcode can be designed to be 5nucleotides in length, in theory allowing for up to 4{circumflex over( )}5 (1024) genes to be targeted simultaneously. In practice, thenumber of usable barcodes, given a barcode length of k bases, may beless than 4^(k) due to the limitations of probe design and synthesis,which are discussed in the section on array synthesis. By designing RTprimers containing a gene-level barcode on the 5′ end, sequencing byligation may be in the more efficient 5′ direction. Moreover, bysequencing additional bases beyond the barcode we may acquire endogenouscDNA sequence, as the bases 5′ of the barcode on the sequencing templatecorrespond to the last RNA-templated bases of the linear cDNA (see FIG.1C).

Targeted FISSEQ libraries for a number of sets of genes can besynthesized, including the clinically relevant Oncotype Dx, in order todemonstrate the use of FISSEQ for making diagnostic and prognosticdeterminations for human cancer. See e.g., Cronin, Maureen, et al.“Analytical validation of the Oncotype DX genomic diagnostic test forrecurrence prognosis and therapeutic response prediction innode-negative, estrogen receptor-positive breast cancer.” Clinicalchemistry 53.6 (2007): 1084-1091. Targeted RNA FISSEQ datasets can beanalyzed for other applications, including cellular developmentalprogramming and detection of individual neuronal connections forconnectomic reconstruction.

RT efficiency can be dependent on various factors including the primersequence, RNA secondary structure, and/or polysome and protein occupancyon the RNA. In some instances, a unique probe barcode sequence may becoupled or incorporated into RT primers. By incorporating a unique probebarcode sequence into each RT primer, the barcode can be utilized todirectly measure both the capture efficiency and specificity of eachprimer. Because each molecule of template is likely to acquire a uniquebarcode from the random barcode pool, the number of original transcriptthat are transcribed can be counted by counting the number of uniquebarcode. While this strategy can be useful for empirically screeningprimers and investigating the logic of efficient in situ RNA sequencecapture by RT, barcoding each probe can reduce the overall number ofgenes that can be simultaneously detected given a particular barcodelength. This during primer design can be mitigated by using the first xbases for the gene-level barcode (theoretical maximum 4^(x) genes), thenthe next y bases for the probe barcode (theoretical maximum 4^(y) probesper gene), with the probe barcodes being degenerate at the gene level.As used herein, a probe barcode may also be referred to as asequence-level barcode, or a sequence barcode. In some instances, thegene-level barcode and the probe may be contiguous. Optionally, the genebarcode and the probe barcode may or may not share common nucleotidebases.

In one example, we designed initial RT primer optimization librariesspecifically targeting the beta actin mRNA and the ribosomal RNA 18S.For probes targeting 18S rRNA, priming in helix 19 exhibited thegreatest efficiency, although we also had high efficiency targetinghelix 22. This result agrees in part with previous studies which relatedto measurement of 18S rRNA accessibility to FISH. See e.g., Applied andEnvironmental Microbiology 69.3 (2003): 1748-1758. The probe barcode canbe in various lengths, for example, 1-3 nucleotides in length, 3-5nucleotides in length, 5-8 nucleotides in length, 8-10 nucleotides inlength, or 10-15 nucleotides in length. In some embodiments, the probebarcode can be 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,18, 19, or 20 nucleotides in length. In some embodiments, the probebarcode may be more than 20 nucleotides in length.

Capture by Circularization

In some instances, the probe capturing a target can be circularized. Insome embodiments, the probe capturing a target can be circularized afterreverse transcription. In some embodiments, the probe capturing a targetcan comprise sequence regions at both 5′ and 3′ ends of the probe thatare complementary to a target sequence so that it circularizes afterhybridizing to the target. In some embodiments, the probe capturing atarget can be circularized with ligase. In some embodiments, the probecapturing a target can comprise a gap region after hybridizing to atarget. In some embodiments, the probe capturing a target can becircularized by using a nucleic acid polymerase to fill in the gapregion and followed by ligase.

In order to amplify a molecular probe using RCA, a circular DNA templatemay be required. A circular probe may be localized to a target RNA usingvarious methods. For example, RT followed by ssDNA circularization isone method of localizing a circular probe molecule to a target RNA. Twoother “capture by circularization” techniques, using padlock probes andmolecular inversion probes (MIPs), may avoid circularization of ssDNA infavor of splint ligation (FIG. 3). Both types of probes (e.g., padlockprobes and/or MIPs) may contain sequence regions at both 5′ and 3′ endsof the probe that are complementary to a target sequence. MIPs maycontain a gap between the 5′ and 3′ ends that is filled by a polymeraseor ligase, incorporating templated bases or complementaryoligonucleotides into the probe. The MIP may be circularized by a ligasewhen the 3′ end meets the 5′ end. See e.g., Hardenbol, Paul, et al.“Multiplexed genotyping with sequence-tagged molecular inversionprobes.” Nature biotechnology 21.6 (2003): 673-678. For padlock probes,the 5′ and 3′ ends may be adjacent to one another when annealed, and maybe ligated directly. See e.g., Nilsson, Mats, et al. “Padlock probes:circularizing oligonucleotides for localized DNA detection.” Science265.5181 (1994): 2085-2088. In both cases, the DNA ligase may enforcespecificity during circularization of the probe by virtue of itsmismatch sensitivity. Capture by circularization can be achieved byeither targeting the RNA directly, or by targeting cDNA.

RNA Splint Ligation

In some embodiments, the probe can be ligated through splint ligation bya ligase. In some embodiments, the ligase may be T4 DNA ligase orSplintR.

Both padlock probes and MIPs can be hybridized directly to RNA.Targeting RNA molecules directly for capture by circularization may havethe advantage of not requiring cDNA synthesis. T4 DNA ligase, inaddition to its conventional activity on DNA substrates, can catalyzethe ligation of nicked DNA within a hybrid DNA:RNA duplex. See e.g.,Nilsson, Mats, et al. “Enhanced detection and distinction of RNA byenzymatic probe ligation.” Nature biotechnology 18.7 (2000): 791-793;Christian, Allen T., et al. “Detection of DNA point mutations and mRNAexpression levels by rolling circle amplification in individual cells.”Proceedings of the National Academy of Sciences 98.25 (2001):14238-14243; Nilsson, Mats, et al. “RNA-templated DNA ligation fortranscript analysis.” Nucleic acids research 29.2 (2001): 578-581. Insome instances, ligases, such as SplintR (NEB) may catalyze thisligation with even greater efficiency. See e.g., Lohman, Gregory J S, etal. “Efficient DNA ligation in DNA-RNA hybrid helices by Chlorella virusDNA ligase.” Nucleic acids research 42.3 (2014): 1831-1844. SplintR mayin some instances exhibit good mismatch detection properties.Optionally, properties, such as good mismatch detection properties ofDNA ligase may be used to capture a MIP directly on an RNA molecule,ligating in short oligonucleotides to fill the gap. In some instances, aMIP may be used against an RNA target, using a reverse transcriptase orDNA polymerase with reverse transcriptase activity to incorporateRNA-templated bases into the circular probe. See e.g., Moser, MichaelJ., et al. “Thermostable DNA polymerase from a viral metagenome is apotent RT-PCR enzyme.” PLoS One 7.6 (2012): e38371.

A polymerase may be used to generate RCA products from a hybridRNA:circular-DNA complex described above. For example, a Phi29 DNApolymerase can synthesize an RCA product from the hybridRNA:circular-DNA complex by digestion of the RNA molecule via 3′exonucleotide activity until the 3′ end primes the circular template.Previous demonstrations of this approach, however, have been restrictedto sequences near the 3′-end of the RNA. See e.g., Stougaard, Magnus, etal. “In situ detection of non-polyadenylated RNA molecules using TurtleProbes and target primed rolling circle PRINS.” BMC biotechnology 7.1(2007): 1; Lagunavicius, Arunas, et al. “Duality of polynucleotidesubstrates for Phi29 DNA polymerase: 3′→5′ RNase activity of theenzyme.” RNA 14.3 (2008): 503-513. In some instances, the 3′ end of theRNA may be required to be in close proximity to the circular template,and RNA secondary structure formation may inhibit Phi29 3′ exonucleaseactivity, preventing it from degrading longer stretches of RNA andpriming RCA. This limitation could be avoided, for example, byintroducing a separate RCA priming region on the probe as disclosed inthe present disclosure.

cDNA Splint Ligation

In some embodiments, the RNA target can be first reverse transcribedinto a cDNA molecule, either specifically or non-specifically, and thenthe cDNA molecule can be captured by a probe. In some instances, theprobe may be a MIP or padlock probe described herein. Ligation of dsDNAusing DNA ligase may occur with greater efficiency and/or with highermismatch sensitiv than ligation of a hybrid duplex. See e.g.,Lagunavicius, Arunas, et al. “Duality of polynucleotide substrates forPhi29 DNA polymerase: 3′→5′ RNase activity of the enzyme.” RNA 14.3(2008): 503-513; Bullard, Desmond R., and Richard P. Bowater. “Directcomparison of nick-joining activity of the nucleic acid ligases frombacteriophage T4.” Biochemical Journal 398.1 (2006): 135-144. See alsoSriskanda, Verl, and Stewart Shuman. “Specificity and fidelity of strandjoining by Chlorella virus DNA ligase.” Nucleic acids research 26.15(1998): 3536-3541. In some instances, ligation of dsDNA using DNA ligasemay occur with k_(cat)/K_(m) equal to or greater than 10, 10²,10³, 10⁴,10⁵, 10⁶, 10⁷ or any value therebetween. Both MIP and/or padlock probecapture against cDNA can be used for multiplex detection of RNAmolecules in situ. See e.g., Ke, Rongqin, et al. “In situ sequencing forRNA analysis in preserved tissue and cells.” Nature methods 10.9 (2013):857-860; Mignardi, Marco, et al. “Oligonucleotide gap-fill ligation formutation detection and sequencing in situ.” Nucleic acids research 43.22(2015): e151-e151. Due to the use of reverse transcription, however,these strategies may have limited capture efficiency (per-moleculecapture efficiency ≤30%). See e.g., Larsson, Chatarina, et al. “In situdetection and genotyping of individual mRNA molecules.” Nature methods7.5 (2010): 395-397. In some embodiments, the RT primer region of theprobe can comprise nucleotide analogs such as LNA bases. RT primerscontaining LNA bases (Exiqon) may be used to increase hybridizationefficiency in situ and also increase amplicon density for downstreampadlock probe capture. See e.g., Larsson, Chatarina, et al. “In situdetection and genotyping of individual mRNA molecules.” Nature methods7.5 (2010): 395-397; Ke, Rongqin, et al. “In situ sequencing for RNAanalysis in preserved tissue and cells.” Nature methods 10.9 (2013):857-860. LNA-containing primers can also be designed for resistance toRNase H such that the localization of the cDNA can be maintained byannealing to the cross-linked RNA molecule. See e.g., Kurreck, Jens, etal. “Design of antisense oligonucleotides stabilized by locked nucleicacids.” Nucleic acids research 30.9 (2002): 1911-1918; Ke, Rongqin, etal. “In situ sequencing for RNA analysis in preserved tissue and cells.”Nature methods 10.9 (2013): 857-860. LNA modifications can beincorporated into the synthesis of a nucleic acid probe or introducedduring PCR for amplification of LNA-RT primers from array-synthesizedoligonucleotide libraries. See e.g., Veedu, Rakesh N., Birte Vester, andJesper Wengel. “Enzymatic incorporation of LNA nucleotides into DNAstrands.” ChemBioChem 8.5 (2007): 490-492.

Without Target-Molecule Splint Ligation

In some embodiments, it may be possible to circularize a linear captureprobe in situ, targeted either to RNA or cDNA, but using anotheroligonucleotide as the splint for ligation, rather than the targetnucleic acid. This strategy may rely on hybridization alone to conferspecificity of the probe-target interaction, but may avoid some problemsthat arise when using the target molecule as the splint. For example, ifboth the probe and the splint oligonucleotide are composed of DNA, thesplint ligation reaction is likely to be efficient.

In some embodiments, hybridizing circular probes directly to the RNAmolecule can be used, avoiding the in situ circularization stepentirely. In some instances, potential problems can arise from thisstrategy due to the different properties of circular ssDNA compared tolinear ssDNA. Linear ssDNA may be a flexible polymer, with a persistencelength in the order of a few nanometers. See e.g., Smith, Steven B.,Yujia Cui, and Carlos Bustamente. “Overstretching B-DNA: the elasticresponse of individual double-stranded and single-stranded DNAmolecules.” Science 271.5250 (1996): 795; Tinland, Bernard, et al.“Persistence length of single-stranded DNA.” Macromolecules 30.19(1997): 5763-5765). Circular ssDNA may have a significantly longerpersistence length. See e.g., Rechendorff, Kristian, et al. “Persistencelength and scaling properties of single-stranded DNA adsorbed onmodified graphite.” The Journal of chemical physics131.LPMV-ARTICLE-2009-002 (2009): 095103. This may reduce the diffusionrate of the probe into the FISSEQ hydrogel as a circular ssDNA typicallymigrates more slowly than the linear molecule during acrylamide gelelectrophoresis. In some cases, circularization of one DNA molecule ontoanother linear DNA molecule may topologically link the molecules. It maybe possible to hybridize a circular DNA molecule with a linear DNAmolecule without introducing a strand break, i.e., without topologicallylinking them, by the region of the circular DNA not participating inbase-pairing winding back around the target strand the same number oftimes as the number of helical turns of the duplex. In some embodiments,hybridization of a circularized probe to a target can exhibit greaterspecificity than a linear probe, possibly due to the bending force ortension created by the locked circular conformation. See e.g., Tang,Yaqin, et al. “Tension promoted circular probe for highly selectivemicroRNA detection and imaging.” Biosensors and Bioelectronics 85(2016): 151-156. These molecules, however, are not constrained bytopology due to the short length of the miRNA.

In some embodiments, a target can be bound by a number of probes.Because RCA massively amplifies the probe, any probe bound off-target orretained non-specifically within the sample can generate a falsepositive capture event. In some instances, it may be difficult todetermine, using only a single probe, whether an amplified probe islocalized to a target molecule. However, strategies for error detectionand error correction using multiple capture probes per molecule can bedevised. For example, the complete barcode sequence might be distributedamong a number of probes. The likelihood of spatially co-localizedoff-target binding or non-specific retention of multiple probescorresponding to the same target molecule is small.

Using Proteins which Exhibit Binding or Other Reactivity to NucleicAcids

In some embodiment, the probe capturing a target can be a probe complexcomprising a protein component and a nuclei acid component. The proteincomponent can facilitate binding of the probe onto the target.

In some embodiments, nucleic-acid binding proteins, such as Cas9, can beused to design targeted RNA FISSEQ methods. In some other embodiments,proteins that catalyze nucleic acid reactions other than binding, suchas cleavage or ligation, may be directed to the detection of certain RNAor cDNA species. Completely programmable RNA-binding proteins can begenerated using concatamers of engineered Pumilio homology domains whichmay be linked to a nucleic acid label (as by mRNA display, ribosomedisplay, or conjugation of a nucleic acid tag). See e.g., Adamala,Katarzyna P., Daniel A. Martin-Alarcon, and Edward S. Boyden.“Programmable RNA-binding protein composed of repeats of a singlemodular unit.” Proceedings of the National Academy of Sciences 113.19(2016): E2579-E2588.

In some embodiments, nucleic acid-guided nucleic acid binding proteinscan be used. Exemplary nucleic acid-guided nucleic acid binding proteinsmay include Argonaute, Cas9, Cpf1, and/or C2c2. See e.g., Bouasker,Samir, and Martin J. Simard. “Structural biology: tracing Argonautebinding.” Nature 461.7265 (2009): 743-744); Mali, Prashant, et al.“RNA-guided human genome engineering via Cas9.” Science 339.6121 (2013):823-826; Cpf1 (Zetsche, Bernd, et al. “Cpf1 is a single RNA-guidedendonuclease of a class 2 CRISPR-Cas system.” Cell 163.3 (2015):759-771; Abudayyeh, Omar O., et al. “C2c2 is a single-componentprogrammable RNA-guided RNA-targeting CRISPR effector.” Science (2016):aaf5573. Nucleic acid-guided nucleic acid binding proteins, such asArgonaute and C2c2, can provide even greater flexibility as a singleprotein is capable of binding a diverse array of targets using a guidenucleic acid to confer specificity. These affinity-binding reactions canbe used to localize a barcode nucleic acid to a target RNA molecule,such as an initiator of hybridization chain reaction (HCR), linear orcircular DNA label for rolling circle amplification (RCA), or other typeof detectable label. In the case of the nucleic acid-guided nucleic acidbinding proteins, the guide strand of RNA or DNA could be detectedinstead of the target nucleic acid molecule. The guide nucleic acidmolecules may be synthesized and amplified from a microarray.

Proteins which catalyze reactions of nucleic acids may be used to affectcertain species of RNA or cDNA. For example, nucleic acid cleavageand/or ligation reactions may be used to modify certain RNA or cDNAspecies for detection in situ. An RNA or cDNA molecule can serve as atemplate for “Tagmentation”, wherein the molecule is cleaved and anexogenous nucleic acid is ligated onto one of the fragments. See e.g.,Syed, Fraz. “Second-Generation Sequencing Library Preparation: In VitroTagmentation via Transposome Insertion.” Tag-Based Next GenerationSequencing: 311-321. The nucleic acid reactions can be directed tocertain species using the binding specificity of the probe complex,comprising the protein component and/or a “guide” nucleic acidcomponent. In some embodiments, certain species of RNA or cDNA which arenot targets of interest can be modified using the strategies describedherein in order to be eliminated for detection. For example, unwantedtargets can be degraded by various exonucleases or endonucleases, suchas RNase H. In some embodiments, certain species of RNA or cDNA may bemodified.

Formation of Linear Polymerase Colonies for Detection

In some embodiment, the probe capturing a target comprises a commonadaptor region for further amplification. In some embodiment, the probeor circularized probe can be amplified by RCA. In some embodiments, alinear amplification product can be prepared, such as polony.

As an alternative to rolling circle amplification (RCA), linearpolymerase colonies (“polonies”) may be formed using a linear substrate.Polonies may be generated as described in Mitra, Robi D., et al.“Fluorescent in situ sequencing on polymerase colonies.” Analyticalbiochemistry 320.1 (2003): 55-65, which is incorporated by referenceherein. Polonies may comprise RNA, cDNA, RNA or cDNA modified in situ byenzymes, nucleic acid probes hybridized to RNA or cDNA, or nucleic acidprobes localized to a target RNA or cDNA species by a nucleicacid-protein complex, such as those listed above. Polonies may besubsequently detected via FISSEQ.

To summarize the RNA capture, in various embodiments, the disclosureprovides that reverse transcription may be targeted to certain RNAspecies by synthesizing probes comprising reverse transcription primerscomprising sequence complementary to RNA sequence, which act tospecifically prime cDNA synthesis of those RNA species. These probescontaining RT primers can subsequently be processed into a FISSEQlibrary.

In some embodiments, the disclosure provides that RNA species can bedetected in situ wherein the final product of the probe is a circularDNA molecule, which acts as a template for rolling circle amplification(RCA), for detection via FISSEQ. In some embodiments, the capture probescan be linear probes, which are hybridized to a RNA or cDNA molecule andcircularized by a ligase when annealed to complementary RNA or cDNAsequence. In some embodiment, the capture probes can be linear probes,which are hybridized to a RNA or cDNA molecule and circularized afterendogenous RNA or cDNA sequence is used as a sequence template, fillingin a “gap” in the linear probe, as by a reverse transcriptase, DNApolymerase, or ligase, such that the probe may be ligated into a circle.In some embodiments, the capture probes can be linear probes, which arehybridized to a RNA or cDNA molecule and circularized by a ligase usingan additional “splint” oligonucleotide independent of the target RNA orcDNA molecule. In some embodiments, the capture probes can be circularprobes, which are hybridized to a RNA or cDNA molecule.

In some embodiments, the present disclosure provides methods to capturea RNA target by a probe complex comprising a protein component and anucleic acid component. The disclosure provides that RNA species can bedetected in situ wherein a nucleic acid-protein complex is directed to acertain RNA or cDNA species via the binding properties of the protein,such as concatamers of engineered Pumilio homology domains, localizing anucleic acid label to the target RNA or cDNA species. In someembodiment, the nucleic acid component of the probe complex is formed bymRNA display, ribosome display, reverse transcription and subsequentcoupling of the cDNA to the protein, or other forms of linkages betweena nucleic acid and protein. In some embodiments, the protein componentof the probe complex is expressed from DNA or mRNA synthesized in partor full using DNA microarray technology. In some embodiment, the nucleicacid component of the probe complex is synthesized in part or full usingDNA microarray technology. In some embodiments, both the nucleic acidcomponent and the protein component of the probe complex are synthesizedin part or full using DNA microarray technology. For example, the probecomplex can be labeled by mRNA display, wherein an mRNA synthesized inpart or in full using DNA microarray technology can direct the proteinsynthesis and constitute the nucleic acid label.

The disclosure provides that RNA species can be detected in situ whereina protein or nucleic acid-protein complex is directed to certain RNA orcDNA species via the nucleic acid sequence (the “guide” nucleic acid).Exemplary nucleic acid-guided binding proteins included, but are notlimited to, Argonaute, Cas9, and C2c2. In some embodiments, the nucleicacid component of a probe is directed to the target RNA or cDNA by theprotein (e.g., using a nucleic acid tagged protein). In someembodiments, the nucleic acid component of the probe is added to thetargeting portion of the “guide” nucleic acid, or constitutes the“guide” nucleic acid. In some embodiments, the nucleic acid component ofthe probe comprises a linear DNA, which is circularized as a templatefor rolling circle amplification (RCA). In some embodiments, the nucleicacid component of the probe comprises a linear RNA, which isspecifically captured using the methods described herein, ornon-specifically captured using random capture FISSEQ (e.g. Lee et al.,Science 2014). In some embodiments, the nucleic acid component of theprobe comprises a circular DNA, which can be amplified using RCA. Insome embodiments, the nucleic acid component of the probe comprises DNAor RNA, which serves as a detectable label by in situ hybridization(ISH, fluorescent in situ hybridization (FISH), hybridization chainreaction (HCR), or cyclic hybridization chain reaction (CHCR).

The disclosure provides that RNA species can be detected in situ whereinone or more nucleic acid reactions are directed to one or more RNA orcDNA species, including cleavage, ligation, modification, such as endmodifications (e.g. 5′ phosphorylation), and protection. In someembodiment, the reaction is directed to certain RNA or cDNA species bythe binding specificity of the probe complex, comprising the proteincomponent and/or a “guide” nucleic acid component. In some embodiment,one or more components of the complex are synthesized using DNAmicroarray. In some embodiment, the target RNA or cDNA sequences aresubsequently processed into a FISSEQ template for detection viasequencing.

In some embodiment, the disclosure provides that RNA species can bedetected in situ wherein a linear polymerase colony (polony) is formedand detected.

In various embodiments, the target molecule is a cDNA moleculesynthesized from an RNA molecule in situ, wherein the cDNA molecule maybe reverse transcribed using targeted RT primers, such as thosecomplementary to certain RNA species, or untargeted (random) RT primers,such as random hexamers or poly(dT) primers.

In various embodiments, the nucleic acid probes or nucleic acidcomponents of the probe complex are synthesized in part or in full usingDNA microarray synthesis technology. In some embodiment, the probes orthe nucleic acid components of the probes can be amplified andsubsequently “matured” into functional probes using methods describedherein.

In some embodiment, the nucleic acid probes or nucleic acid componentsof the probe complexes comprise locked nucleic acid (LNA) bases or othernucleic acid analogs. In some embodiments, the modified nucleic acidprobes can function to enhance the kinetics, efficiency, or specificityof hybridization or direction of the probe-target molecule interaction;e.g. by incorporation of LNA or nucleic acid analog bases by PCR, RT, orduring the enzymatic amplification and “maturation” of the probe.

In various embodiments, a plurality of probes for targeted FISSEQ aresynthesized and/or used simultaneously.

Various methods can be used to detect nucleic acid sequences. In someembodiments, detection of the RNA species is enabled by detection ofnucleic acid sequence templated from RNA or cDNA by sequencing bysynthesis (SBS), sequencing by ligation (SBL), or sequencing byhybridization (SBH). In some embodiments, the detection of targeted RNAspecies is enabled by the detection of nucleic acid sequence containedin the probe or nucleic acid component of the probe complex bysequencing by synthesis (SBS), sequencing by ligation (SBL), orsequencing by hybridization (SBH); e.g. “barcode” sequencing. In someother embodiments, the detection of the targeted RNA species comprisesdetection of both nucleic acid sequence templated from RNA or cDNA andnucleic acid sequence contained in the probe or nucleic acid componentof the probe complex, sequencing by synthesis (SBS), sequencing byligation (SBL), or sequencing by hybridization (SBH).

The disclosure provides that RNA species can be detected in situ whereinone or more nucleic acid reactions are directed to one or more RNA orcDNA species, including degradation, modification, such as endmodifications (e.g. 2′O-methyl addition), and de-protection. In someembodiments, the reaction is directed to certain RNA or cDNA species bythe binding specificity of the probe complex, comprising nucleic acidcomponent synthesized using DNA microarray. In some embodiments,unwanted target RNA or cDNA sequences are subsequently depleted from thesample, such that they are not represented in the subsequent FISSEQlibrary and the datasets. In some embodiments, the specific depletion ismediated by RNase H digestion of an RNA:DNA hybrid duplex. In someembodiments, the specific depletion is mediated by Cas9 or otherprotein-nucleic acid complex. In some embodiments, the RNA speciestargeted for selective degradation comprise ribosomal RNA (rRNA) ortransfer RNA (tRNA). In some embodiments, oligonucleotides can be usedto block extension of a reverse transcriptase. In some embodiment,oligonucleotides can be used to block access to the unwanted target RNAby another RNA probe complex. Exemplary RNA species targeted forselective degradation can comprise housekeeping genes. Housekeepinggenes may be constitutive genes that are required for the maintenance ofbasic cellular function. In some embodiments, the unwanted target RNAspecies targeted for selective degradation comprise genes expressed atan average RNA abundance greater than a certain level. Optionally,unwanted target RNA species targeted for selective degradation maycomprise genes expressed at an average RNA abundance equal to or greaterthan about 100 RNA molecules per cell, 200 RNA molecules per cell, 400RNA molecules per cell, 600 RNA molecules per cell, 800 RNA moleculesper cell, 1000 RNA molecules per cell, 1200 RNA molecules per cell, 1400RNA molecules per cell, 1600 RNA molecules per cell, 1800 RNA moleculesper cell, 2000 RNA molecules per cell, 10000 RNA molecules per cell ormore (e.g., where rRNA is present at up to 10 million copies per cell),or any value therebetween.

Targeted DNA Capture

All methods described in the previous section “Targeted RNA Capture” maybe applied to the detection of specific DNA sequences in situ, withreasonable substitution of “RNA” or “cDNA” by “DNA”, and concomitantexchange, where appropriate, of relevant enzyme or protein components ofthe detection scheme for corresponding enzymes or proteins whichcatalyze the corresponding reactions for DNA substrates.

These methods may accompany additional sample treatment steps, such asdenaturing dsDNA into ssDNA for the purpose of enabling hybridization ofa nucleic acid probe or nucleic acid-protein probe complex.

Capture by circularization methods may enable detection of singleprobes, corresponding to tens of genomic bases, which can be highlyamplified using RCA. The large size of the RCA amplicon, typically a fewhundred nanometers, may reduce the spatial resolution, which can beimaged using super-resolution microscopy techniques. Capture bycircularization may in some instances enable detection of singlenucleotide variation for mutation detection and haplotyping at arbitraryscales.

The methods described herein for using nucleic acid binding proteins todetect RNA could also be used to detect DNA. Programmable DNA bindingproteins include, but are not limited to, meganucleases, zinc fingers,transcription activator-like effectors, and BurrH. Nucleic acid-guidednucleic acid binding proteins, such as Argonaute, Cas9, Cpf1, and C2c2,may in some instances provide greater flexibility as a single protein iscapable of binding a diverse array of targets using a guide nucleic acidto confer specificity. In some instances, as the protein may change theenergy landscape of binding (e.g., as compared to nucleic acidhybridization alone), these methods can have faster kinetics and greatersensitivity to sequence mismatches.

The most information-rich method of genomic FISSEQ is the direct analogof next generation sequencing (NGS)—using genomic sequence to templatethe construction of a sequencing library. Preparation of genomicsequencing libraries for NGS may involve the steps of fragmentation,end-repair, adaptor ligation, and/or PCR. However, some of these stepscan be achieved simultaneously (e.g., as with Illumina's Nexteramethod), by using a transposase enzyme to fragment the DNA and ligateadaptors in a single reaction called “tagmentation”. Existing NGSRNA-seq protocols can be adapted with modifications to RNA FISSEQ andgenomic FISSEQ.

In some instances, enzymatic and/or chemical means of genomicfragmentation can be compatible with FISSEQ. Optionally, physicalmethods of fragmenting DNA may not be compatible with FISSEQ. Thephysical methods may include acoustic shearing and sonication, which maydamage other aspects of the sample. DNase I or Fragmentase, a two enzymemix (New England Biolabs) may in some instances be effective for NGSlibrary construction. Within fixed biological specimens, DNA may alsonaturally fragmented by a number of factors, including decomposition ofapurinic/apyrimidinic sites formed by low pH formalin and environmentalconditions during storage. The ends of DNA fragments can be conditionedfor adaptor ligation using a number of enzymatic treatments, includingblunt-ending and 5′ phosphorylation by T4 polynucleotide kinase, T4 DNApolymerase, and Klenow Large Fragment, and 3′ A-tailing by Taqpolymerase or Klenow Fragment (exo-).

In contrast to most genomic NGS libraries, preparative PCR to amplifythe library may not be necessary. As a result, each sequence that can bedetected may correspond 1:1 with a genomic fragment, avoiding the needfor unique molecular identifiers or other techniques to disambiguate PCRclones. Instead, the sequencing templates for amplification by RCA canbe circularized. If the fragments are modified on both ends with knownadaptor sequences, the molecules can be circularized using splintligation. Alternatively, new adaptor ligation and circularizationstrategies specifically tailored to FISSEQ can be devised. For example,library construction protocols involving hairpin ligation as a mechanismof circularization can be imagined. For example, hairpin nucleic acidmolecules may be ligated to both strands at each end of a dsDNAfragment, serving to circularize the fragment as a template for RCA.

A strategy based on Illumina's Nextera method can be used, but usingCas9 instead of a DNA transposase to target the library construction toparticular genomic sequences. This may allow finely tailoring thefragment size for library construction, as well as to enrich for loci ofinterest. Using this method, the Cas9 sensitivity to nucleosomes mayalso provide additional information about chromatin state. Detection ofmethylation can also be performed. For example, fragmentation byrestriction enzymes sensitive to CpG methylation, a form of in situmethylation sensitive restriction enzyme sequencing (MRE-seq), can beused. BS-seq and MethylC-seq can also be adapted to FISSEQ, which usesodium bisulfite treatment to convert unmethylated cytosine to uracil,while methylated cytosine's are protected, with changes detectedrelative to the reference genome sequence.

To summarize targeted DNA FISSEQ, in various embodiments, the disclosureprovides that DNA species can be detected in situ. In some embodiments,the final product of the probe for DNA FISSEQ is a circular DNAmolecule, which acts as a template for rolling circle amplification(RCA), for detection via FISSEQ. In some embodiments, the probes for DNAtarget capture are linear probes, which are hybridized to a DNA moleculeand circularized by a ligase when annealed to complementary DNAsequence. In some embodiments, the probes for DNA target capture arelinear probes, which are hybridized to a DNA molecule and circularizedafter endogenous DNA sequence is used as a sequence template, filling ina “gap” in the linear probe, as by a DNA polymerase, or ligase, suchthat the probe may be ligated into a circle. In some embodiments, theprobe for DNA target capture are linear probes, which are hybridized toa DNA molecule and circularized by a ligase using an additional “splint”oligonucleotide independent of the target DNA molecule. In someembodiments, the probes for DNA target capture are circular probes,which are hybridized to a DNA molecule.

In some embodiments, a nucleic acid-protein complex is directed to acertain DNA species via the binding properties of the protein, such asconcatamers of engineered Pumilio homology domains, localizing a nucleicacid label to the target DNA species. In some embodiments, the nucleicacid component of the probe complex is formed by mRNA display, ribosomedisplay, reverse transcription and subsequent coupling of the cDNA tothe protein, or other forms of linkages between a nucleic acid andprotein. In some embodiments, the protein component of the probe complexis expressed from DNA or mRNA synthesized in part or full using DNAmicroarray technology. In some embodiments, the nucleic acid componentof the probe complex is synthesized in part or full using DNA microarraytechnology. In some embodiments, both a nucleic acid component andprotein component of the probe complex are synthesized in part or fullusing DNA microarray technology, e.g. as by labeling of the complex bymRNA display. In some embodiments, the probe complex can be labeled withan mRNA, wherein the mRNA synthesized in part or in full using DNAmicroarray technology directs both the protein synthesis and constitutesthe nucleic acid component of the probe.

The disclosure provides that DNA species can be detected in situ whereina protein or nucleic acid-protein complex is directed to certain DNAspecies via the nucleic acid sequence (the “guide” nucleic acid), suchas Argonaute, Cas9, and C2c2. In some embodiments, a nucleic acidcomponent of a probe is directed to the target DNA by the protein (e.g.,using a nucleic acid tagged protein). In some embodiments, the nucleicacid component of a probe is added to the targeting portion of the“guide” nucleic acid, or constitutes the “guide” nucleic acid. In someembodiments, the nucleic acid component of the probe comprises a linearDNA, wherein the linear DNA is circularized as a template for rollingcircle amplification (RCA). In some embodiments, the nucleic acidcomponent of the probe comprises a linear RNA, wherein the linear RNA isspecifically captured using the methods described herein, ornon-specifically captured using random capture FISSEQ (e.g. Lee et al.,Science 2014). In some embodiments, the nucleic acid component of theprobe comprises a circular DNA, which is amplified using RCA. In someembodiments, the nucleic acid component of the probe comprises a DNA orRNA, wherein the DNA or RNA serves as a detectable label by in situhybridization (ISH), fluorescent in situ hybridization (FISH),hybridization chain reaction (HCR), or cyclic hybridization chainreaction (CHCR).

In various embodiments, the present disclosure provides methods offorming an in situ DNA sequencing library (FISSEQ library) by contactingthe sample with a plurality of probe complexes. In some embodiments, theprobe complex comprises a ssDNA, dsDNA, ssRNA, or other nucleic acid. Insome embodiments, the probe complex comprises a DNA transposase. In someembodiments, the probe complex comprises a Cas9 or other nucleicacid-directed nucleic acid-binding protein.

The disclosure provides that DNA species can be detected in situ whereinone or more nucleic acid reactions are directed to one or more DNAspecies, including cleavage, ligation, modification, such as endmodifications (e.g. 5′ phosphorylation), and protection. In someembodiments, the reaction is directed to certain DNA species by thebinding specificity of the probe complex, comprising the proteincomponent and/or a “guide” nucleic acid component. In some embodiments,one or more components of the complex is synthesized using DNAmicroarray. In some embodiments, the target DNA sequences aresubsequently processed into a FISSEQ template for detection viasequencing. In some embodiments, a linear polymerase colony (polony) isformed and detected in the target FISSEQ.

The probes disclosed herein can be synthesized by microarray. In someembodiments, the nucleic acid probes or nucleic acid components of theprobe complex are synthesized in part or full using DNA microarraysynthesis technology. In some embodiments, the synthesized probes can beamplified and subsequently “matured” into functional probes usingmethods described herein.

In some embodiments, a target dsDNA is converted into a single-stranded(ssDNA) target, as by thermal melting of the duplex and/or enzymaticdigestion of one strand.

In some embodiments, the nucleic acid probes or probe complexes compriseone or more nucleotide analogs. In some embodiments, the nucleic acidprobes or nucleic acid components of the probe complex comprises lockednucleic acid (LNA) bases or other nucleic acid analogs. In someembodiments, the modified probes can function to enhance the kinetics,efficiency, or specificity of hybridization or direction of theprobe-target molecule interaction. The incorporation of LNA or nucleicacid analog bases can be done, for example, by PCR, RT, or during theenzymatic amplification and “maturation” of the probe.

In some embodiments, a plurality of probes are synthesized and/or usedsimultaneously. In some embodiments, the detection of the DNA species isenabled by detection of nucleic acid sequence templated from DNA bysequencing by synthesis (SBS), sequencing by ligation (SBL), orsequencing by hybridization (SBH). In some embodiments, the detection ofthe DNA species comprises detection of nucleic acid sequence containedin the probe or nucleic acid component of the probe complex bysequencing by synthesis (SBS), sequencing by ligation (SBL), orsequencing by hybridization (SBH); e.g. “barcode” sequencing. In someembodiments, the detection of the DNA species comprises detection ofboth nucleic acid sequence templated from DNA and nucleic acid sequencecontained in the probe or nucleic acid component of the probe complex,by sequencing by synthesis (SBS), sequencing by ligation (SBL), orsequencing by hybridization (SBH).

The disclosure provides that DNA species can be detected in situ whereinone or more nucleic acid reactions are directed to one or more DNAspecies, including degradation, modification, such as end modifications(e.g. 2′O-methyl addition), and de-protection. In some embodiments, thereaction is directed to certain DNA species by the binding specificityof the probe complex, comprising nucleic acid component synthesizedusing DNA microarray. In some embodiments, the target DNA sequences aresubsequently depleted from the sample, such that they are notrepresented in the subsequent FISSEQ library and datasets. In someembodiments, the specific depletion is mediated by RNase H digestion ofan RNA:DNA hybrid duplex. In some embodiments, the specific depletion ismediated by Cas9 or other protein-nucleic acid complex. In someembodiments, the DNA species targeted for selective degradation compriserepetitive sequences. In some embodiments, oligonucleotides can be usedto block extension of a DNA polymerase or the reaction of a DNA ligase.In some embodiments, oligonucleotides can be used to block access to thetarget DNA by another probe complex.

Enhancing Hybridization for Capture

During investigation of the strategies for targeted RNA FISSEQ andgenome FISSEQ, a problem can be present: unacceptably slow kinetics ofthe hybridization reaction between the probe and the target moleculewithin a biological sample. For example, the present disclosure providesa method for enhancing a hybridization reaction in a cell or cellularmatrix. In some embodiments, a crowding agent may be used to enhancehybridization for capture. In some embodiments, a crowding agent can beused, wherein the crowding agent enhances enzyme activity. In someembodiments, a crowding agent comprises a cleavable charged group,wherein the charged group can be cleaved off and washed away. In someembodiments, a crowding agent comprises a charged group, wherein thecharged group can be neutralized after nucleic acid hybridization butbefore enzymatic reactions. In some embodiments, a crowding agent can bedegraded.

Reaction Buffer Concentration Dependency

First, all of the designs for massively multiplex targeted FISSEQinvolve using a complex pool of oligonucleotide probes. However, sincehybridization is a bimolecular reaction, the kinetics of hybridizationmay be dependent on the concentration of both molecular species. Whenusing a pool of probes (also known as a library), the concentration ofeach individual probe is reduced proportional to the diversity of thelibrary. Therefore if we attempt to simultaneously target 1000 genes,the concentration of each probe may be reduced by at least 1000-fold,and potentially more (e.g., if multiple probes are used per gene).

For this reason, execution of massively multiplex targeted FISSEQ may beenhanced by overall increasing the concentration of the probe pool. Atypical probe concentration for RNA FISH may be 1˜150 nM. See e.g., Raj,Arjun, et al. “Imaging individual mRNA molecules using multiple singlylabeled probes.” Nature methods 5.10 (2008): 877. For ISH of a complexlibrary of probes, the concentration is scaled to up. See e.g., Chen,Kok Hao, et al. “Spatially resolved, highly multiplexed RNA profiling insingle cells.” Science 348.6233 (2015): aaa6090. In some instances, theconcentration of the complex library of probes may be scaled to aconcentration equal to or greater than about 10 μM, 20 μM, 40 μM, 60 μM,80 μM, 100 μM, 120 μM, 140 μM, 160 μM, 180 μM, 200 μM, or any valuetherebetween. This can be further scaled. For example, the value may befurther scaled by a factor of about 2, 4, 6, 8, 10, 15, 20, 25, 30, 40,50, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, or any valuetherebetween. Optionally, the value may be scaled 100-fold, which may benear the solubility limit of DNA in water, ˜10 mM. Relatively cheapenzymatic amplification of microarray-synthesized oligonucleotidelibraries make this within the realm of possibility.

Crowding Agents with Enhanced Enzyme Compatibility

Optimal conditions for efficient probe hybridization and the downstreamenzymatic reactions may tend to be generally mutually exclusive in someinstances. For example, in situ hybridization may be efficient in thepresence of a crowding agent. An example of a crowding agent may bedextran sulfate. However, enzymes may be strongly inhibited by dextransulfate. See e.g., Bouche, J. P. “The effect of spermidine onendonuclease inhibition by agarose contaminants.” Analyticalbiochemistry 115.1 (1981): 42-45. Being a highly charged, high molecularweight polymer, it may be difficult to wash dextran sulfate from thesample to the point where the downstream enzymatic reactions, e.g.,reverse transcription, ligation, DNA polymerization, are not stronglyinhibited. In the absence of a crowding agent, however, the kinetics ofin situ hybridization may be orders of magnitude slower. See e.g., Wahl,Geoffrey M., Michael Stern, and George R. Stark. “Efficient transfer oflarge DNA fragments from agarose gels to diazobenzyloxymethyl-paper andrapid hybridization by using dextran sulfate.” Proceedings of theNational Academy of Sciences 76.8 (1979): 3683-3687. Therefore, acrowding agent that does not strongly inhibit enzymatic reactions can beused in FISSEQ. Crowding agents may be typically high-molecular weight,high valency charged polymers. For example, crowding agents may bepolymers such as polyacrylic acid, polyvinylsulfonic acid, and alginate.Optionally, crowding agents may be polymers similar to dextran sulfate.In some instances, an intermolecular organization of the crowding agentsmay be a factor in determining its effectiveness as a crowding agent.

As one example, dextran sulfate is understood to aid in the formation ofnetworks (highly localized concentrations of probes) duringhybridization, thus expediting the annealing process. The G-blocks ofalginate are believed to participate in intermolecular cross-linkingwith divalent cations (e.g., Ca²⁺) to form hydrogels. Dextran sulfate isnot known to form hydrogels, other than under exogenous chemicalcross-linking reactions and in the presence of chitosan, neither ofwhich is present during typical nucleic acid hybridization reactions. Insome instances, the crowding agent may comprise an ability toself-associate in the formation of hydrogels. Alternatively, thecrowding agent may not comprise the ability to self-associate in theformation of hydrogels. Optionally, this difference in the ability toself-associate in the formation of hydrogels may explain the differencebetween dextran sulfate and alginate in improving the kinetics ofnucleic acid DNA hybridization reactions.

For example, polyacrylic acid and polyvinylsulfonic acid may botheffectively function as a crowding agent while alginate may not. Thismay be due to the intermolecular organization, which reduces itseffectiveness in crowding DNA. In some instances, polyacrylic acidstrongly may inhibit enzymatic reactions, but polyvinylsulfonic acid mayexhibit much less inhibition. As one example, one mechanism ofinhibition may be via chelation of essential metal or charged cofactors,such as Mg²⁺, Ca²⁺, Mn²⁺, Nat, phosphate, and other metal and chargedions, which are required for enzyme function. A polyion salt such assodium polyacrylic acid may exchange sodium ions for magnesium ions inthe presence of a magnesium-containing enzyme reaction buffer, reducingthe effective concentration of the essential cofactor. Another mechanismof inhibition may be binding and structural damage to the enzyme, e.g.the charge attraction and binding between charged domains of the enzymeand the ionic polymer, which may cause effective sequestration of theenzyme within the reaction, as well as disrupt electrostatic or chargeinteractions within the enzyme, which are required for enzyme structureand related function. Wettability, or hydrophobicity, charge, andstructure may alternatively, or additionally contribute to the strengthof polyion-protein interactions. Protein absorption on the polyion orwithin the polyionic network may contribute to effective decrease inenzyme concentration.

In some embodiments, compounds that can function as crowding agents, buthave some property of molecular programmability may be used. Forexample, some polymers that can function as crowding agents can be usedand subsequently the charged group can be cleaved off or neutralized.This can convert the compound into a neutral polymer like PEG, whichactually enhances the efficiency of enzymatic reactions. In someembodiments, polymers can function as a crowding agent, and then bespecifically degraded into small monomers and can be easily washed fromthe sample. The chemistry of passivation or degradation of the crowdingagent needs to be orthogonal to nucleic acids, i.e., not degradingnucleic acids or rendering nucleic acids incompatible. Some of thesefunctional groups include alpha-hydroxy acids, which can be cleaved bysodium periodate; beta-keto acids, which can be cleaved with heat;phosphorothioate linkages, which can be cleaved with silver ions;disulfide linkages, which can be cleaved by reduction into thiols; andother types of chemical linkages which may be cleaved by photo- orchemical treatment.

Examples of programmable polyions or polyelectrolytes forenzyme-compatible enhancement of nucleic acid hybridization kineticsinclude polycondensation reactions of Cys(Lys)nCys, polymers such asPEG, PVA, or PAA, which may be subsequently modified via a cleavablelinker to include chemical groups conferring ionic charge, or polymersformed from monomers including cleavable linkages, such that the polymermay be degraded subsequent to functioning as a crowding agent. See e.g.,Oupický, David, Alan L. Parker, and Leonard W. Seymour. “Laterallystabilized complexes of DNA with linear reducible polycations: strategyfor triggered intracellular activation of DNA delivery vectors.” Journalof the American Chemical Society 124.1 (2002): 8-9. As an alternative toionic charge, these polymers may include non-ionic groups that becomehydrated in solution, which also enhance nucleic acid hybridizationrates by molecular crowding and/or sequestration of water.

In various embodiments, the disclosure provides methods to enhance thehybridization of diverse libraries of probes for targeting the captureof RNA, cDNA, and DNA species for detection via FISSEQ. In someembodiments, the method comprises adding a crowding agent in thehybridization buffer. In some embodiments, a hybridization buffercontaining high salt, such as SSC (sodium chloride sodium citratebuffer), can be used. The salt concentration can be at 1×, 2×, 5×, 10×,or more concentrations. In some other embodiments, the saltconcentration can be at from about 1× to 3×, from about 3× to 5×, orfrom about 5× to 10×. In some embodiments, a hybridization buffercontaining blocking agents can be used. The blocking agents can reducenon-specific binding of probes to off-target sequences and/or bypreventing electrostatic interactions with other components of thesample, such as yeast tRNA, salmon sperm, detergents such as Triton-X,Tween 20, SPAN, peptides such as BSA, and other agents such as Ficoll.In some embodiments, a hybridization buffer containing agents whichalter the annealing properties of DNA, such as the melting temperature.Exemplary agents that can alter the annealing temperature includeformamide. In some embodiments, high concentrations of probe can beused. Exemplary concentrations of probes can include 1-150 nM, 1-150 μM,or up to 10 mM. In some embodiments, a crowding agent, such as dextransulfate or polyacrylic acid, can be used. In some embodiments, acrowding agent can have average molecular weights Mn of 1-10 kDa, 10-20kDa, 10-100 kDa, 100-300 kDa, or 100-1000 kDa. In some embodiments, acrowding agent can have a charge density of 1-10%, 10-30%, 10-99%, or100% monomer occupancy. In some embodiments, a crowding agent canpresent in 1%, 5%, 10%, 15%, 20%, or more weight per volume in thereaction.

The disclosure provides a crowding agent, for example, a polyionic,polyelectrolyte, or hydrophilic and strongly hydrating polymer,comprising a polymer backbone and one or more hydrating groups. Thehydrating groups can be ionic, electrolytic, or hyrophylic. In someembodiments, the hydrating groups may be specifically inactivated, e.g.,as by rendering an ionic group to have neutral charge, or as byrendering a strongly hydrating group to be weakly hydrating;

In some embodiments, the inactivation chemistry is substantiallynonreactive with RNA, DNA, proteins, and/or other types of biomolecules.In some embodiments, the inactivated polymer is compatible withenzymatic reactions.

The disclosure provides a crowding agent, for example a polyionic,polyelectrolyte, or hydrophilic and strongly hydrating polymer,comprising a cleavable linkage between the polymer backbone and thehydrating group. In some embodiments, the cleavable linkages comprisealpha-hydroxy acids, which can be cleaved by sodium periodate. In someembodiments, the cleavable linkages comprise beta-keto acids, which canbe cleaved with heat. In some embodiments, the cleavable linkagescomprise phosphorothioate linkages, which can be cleaved with silverions. In some embodiments, the cleavable linkages comprise disulfidelinkages, which can be cleaved by reduction into thiols. Other types ofchemical linkages may be cleaved by photo- or chemical treatment.

The disclosure provides a crowding agent, for example a polyionic,polyelectrolyte, or hydrophilic and strongly hydrating polymer,comprising cleavable linkages along the backbone of the polymer, whereincleavable linkages include those disclosed herein.

The disclosure provides use of a crowding agent for targeted RNA or DNAdetection, wherein a plurality of probes is hybridized in situ using ahybridization buffer containing one of the crowding agents disclosedherein.

The disclosure provides a method of detecting RNA and DNA, comprisingthe step of hybridizing a plurality of probes in situ using ahybridization buffer containing one of the crowding agents disclosedherein. In some embodiments, the methods further comprise the step oftriggering cleavage of the cleavable groups present in the crowdingagent.

The disclosure provides a method of detecting RNA and DNA, comprisingthe step of hybridizing a plurality of probes in situ using ahybridization buffer containing one of the crowding agents disclosedherein and further comprising the step of inactivating the hydratinggroups.

DNA Array Synthesis of Probe Pools

A DNA microarray (also commonly known as an array, DNA chip, biochip, orchip) may refer to a collection of microscopic DNA spots attached to asolid surface. See e.g., Heller, Michael J. “DNA microarray technology:devices, systems, and applications.” Annual review of biomedicalengineering 4.1 (2002): 129-153. Microarray DNA synthesis platforms,offered commercially by Agilent, CustomArray, and Twist Bioscience, mayin some instances be used to generate massively complex short(approximately 200 nucleotide) oligonucleotide libraries. See e.g.,Kosuri, Sriram, and George M. Church. “Large-scale de novo DNAsynthesis: technologies and applications.” Nature methods 11.5 (2014):499-507. Microarray synthesis may refer to the synthesis of DNA ornucleic acid analog oligonucleotides attached to a solid substrate.Commercial supplier Twist Bioscience, for example, features microarrayscontaining 9,600 wells with 121 discrete oligonucleotide speciessynthesized per well, for a total of 1.16 million oligonucleotides perarray. Commercial supplier Agilent's OLS libraries contain just over244,000 oligonucleotide species, while the DNA microarray of commercialsupplier Custom Array synthesizes just over 94,000. Each DNA species maybe synthesized in minute quantities, such as picomoles (10-12 moles) ofDNA molecules. Each DNA microarray synthesis technology may vary infeatures such as error rate, oligonucleotide length, and sequencelimitations such as homopolymer repeats and secondary structure. Theselibraries of oligonucleotides may be typically liberated from the solidsupport substrate into a solution of DNA species representing arenewable source of single-stranded DNA probes, generating usingtechniques to highly amplify and process the library, in whole or inspecific subpools. See e.g., Beliveau, Brian J., NicholasApostolopoulos, and Chao-ting Wu. “Visualizing genomes with OligopaintFISH probes.” Current Protocols in Molecular Biology (2014): 14-23;Chen, Kok Hao, et al. “Spatially resolved, highly multiplexed RNAprofiling in single cells.” Science 348.6233 (2015): aaa6090; Kosuri,Sriram, et al. “Scalable gene synthesis by selective amplification ofDNA pools from high-fidelity microchips.” Nature biotechnology 28.12(2010): 1295-1299. Alternatively, the oligonucleotides may be amplifieddirectly from the solid support, in whole or in specific subpools.

Computational Design

Unlike traditional DNA synthesis, for which sequences can be designedmanually or individually using computational tools, the scale of arrayDNA synthesis can be designed with computational pipelines for sequencedesign and management. See e.g., Rozen, Steve, and Helen Skaletsky.“Primer3 on the WWW for general users and for biologist programmers.”Bioinformatics methods and protocols (1999): 365-386; Rouillard,Jean-Marie, Michael Zuker, and Erdogan Gulari. “OligoArray 2.0: designof oligonucleotide probes for DNA microarrays using a thermodynamicapproach.” Nucleic acids research 31.12 (2003): 3057-3062). Thesemethods may consider both aspects of probe design related to thefunction of the probes as well as idiosyncrasies of the arraymanufacturing process.

DNA Probes

For designing probes complementary to genomic sequences or RNA sequencesderived from the genome, a custom Genome Tools Python library can beused to memory map the chromosomal sequence files and provideannotation-derived indexing. This approach may enable lazy-loading ofchromosomal regions of interest while minimizing excessive memory use,disk thrashing, performance bottlenecks, and other downsides associatedwith attempting to store full chromosomal data in primary memory. Whenusing an SSD drive, accessing sequences may be approximately 80% as fastas in memory access, but with a minimal memory footprint containing onlyindex metadata and lazy-loaded regions of interest. The GFF/GTF FileFormat (General Feature Format/General Transfer Format) provided byEnsemble may be used for storing genome annotations. See e.g., Kawaji,Hideya, and Yoshihide Hayashizaki. “Genome annotation.” Bioinformatics:Data, Sequence Analysis and Evolution (2008): 125-139. To allow facilecompilation of gene target lists, the Gencode or other referenceannotation and translation tables can be used to map between genomic,transcriptomic, and protein annotations. See e.g., Harrow, Jennifer, etal. “GENCODE: producing a reference annotation for ENCODE.” Genomebiology 7.1 (2006): 1. The appropriate reference genome build on aper-application basis can be selected, as individual projects andcollaborators rely on build-specific datasets for design-of-experiment.In some instances, the stable GRCh37 annotation and sequence assemblyrelease can be used.

RNA Probes

For designing probes against RNA, the fact that RNA species are presentin many isoforms due to alternative splicing may be taken intoconsideration. If interested in detecting a particular sequence featureof the RNA, such as a particular exon, intron, sequence junctions,expressed polymorphism, or site of RNA editing, the approach may belimited to designing probes that specifically target that segment of theRNA molecule. In these cases, the Genome Tools library may be used togenerate annotation-derived probe sequences, constrained by the probesequence design logic.

In some instances, it may be desirable to detect any isoforms of an RNAspecies. To maximize the generality of the probes of the presentdisclosure across transcript isoforms, the Genome Tools library may beused to create an exon “pileup” for the target RNA species.Conceptually, the pileup may be intended to identify exonic regions thatare most common across all isoforms. In practice, the pileup may simplybe an array, the extents of which may be defined by the outermost boundsof the transcribed sequence (i.e., the first base of the 5′-mosttranscript variant and the last base of the 3′-most variant), withrespect to the genome. The value at each position in the array may bethe number of annotated transcript variants that have exonic sequence atthe respective location among all isoforms. This method may assume thatall transcript isoforms are equally likely to be expressed in any givenexperiment. The sequence design can be improved by incorporating priorknowledge about the tissue- and cell-type specific expression patternsof transcript isoforms to improve the fraction of probes complementaryto expressed RNA sequence. There may be additional means of leveragingannotation data to provide weighted exon scoring with respect toorganism-level transcript frequencies or annotation confidence.

Probe Design Logic

Given an assay, a thermodynamic target melting temperature for the probedesign can be determined. The target sequences provided by Genome Toolsfor genomic or transcriptomic targets can be then chunked into smallcandidate probe sequences, such as 15 nucleotide segments. Forgene-specific RT primers, the length of probes can often be a compromisebetween enforcing specificity and minimizing propensity forself-circularization. In some embodiments, the full“adapter-barcode-probe” construct has a length of no more than 40nucleotides in length to minimize self-circularization. In some otherembodiments, the full “adapter-barcode-probe” construct can be 10-15nucleotides in length, 15-20 nucleotides in length, 20-25 nucleotides inlength, 25-30 nucleotides in length, 30-35 nucleotides in length, 35-40nucleotides in length, 45-50 nucleotides in length, 50-55 nucleotides inlength, 55-60 nucleotides in length, 60-65 nucleotides in length, 65-70nucleotides in length, 70-75 nucleotides in length, 75-80 nucleotides inlength, 80-85 nucleotides in length, 85-90 nucleotides in length, 90-95nucleotides in length, or 95-100 nucleotides in length. In someembodiments, the full “adaptor-barcode-probe” construct can be at least20 nucleotides in length, at least 30 nucleotides in length, at least 40nucleotides in length, at least 50 nucleotides in length, at least 60nucleotides in length, at least 70 nucleotides in length, at least 80nucleotides in length, at least 90 nucleotides in length, at least 100nucleotides in length, or at least 1500 nucleotides in length. After theinitial chunking, each candidate probe can be scored using metricsintended to predict its specificity and efficiency in the context ofboth the initial synthesis and FISSEQ sample preparation. For example,we exclude any segments containing G quadruplex. We then score the probebased on melting temperature, which also provides an implicit GC metricif the probe length is also pre-defined. In some embodiments, the probelength can be determined and fixed, which simplifies the array synthesisby providing that all oligonucleotides are of equal length. (It ispossible to add padding sequences to the ends of oligonucleotides toenable array synthesis of libraries with variable length. However, thismay complicate probe maturation and downstream processing as the finalprobes will also have a distribution of lengths, limiting the degree towhich we can use size selection to purify the probe pool.) For RNAprobes, the segments can also be scored using the exon pileup describedherein.

Additional design constraints may be considered as a means of improvingboth individual and population-level probe performance. For example,probes may be screened to reduce the likelihood of probeheterodimerization. Finding a set of mutually compatible probe sequenceswithin a thermodynamic threshold for heterodimer formation may be achallenging computational task. One such approach could involvegenerating a graph data structure describing all pairwise interactionswithin a given probe pool and then using a network elimination algorithmto produce a set of probe nodes with minimal or zero interconnectivity(indicating a lack of predicted heterodimization reactions). While thisapproach has proven effective in similar contexts (Xu, Qikai, et al.“Design of 240,000 orthogonal 25mer DNA barcode probes.” Proceedings ofthe National Academy of Sciences 106.7 (2009): 2289-2294), it may becomputationally infeasible for this application, as the presence of ourmany other constraints may preclude convergence on an acceptablesolution. This effort would be enabled by better understanding of thesequence-dependency of downstream FISSEQ steps, such as RT, which wouldimprove the metrics for network elimination.

In some instances, it may be reasonable to screen for specificity of theprobe. There are a number of strategies that may be used tocomputationally screen probes for off-target binding. A simple strategywould be to prune common k-mers from the pool See e.g., Melsted, Pall,and Jonathan K. Pritchard. “Efficient counting of k-mers in DNAsequences using a bloom filter.” BMC bioinformatics 12.1 (2011): 1.Thermodynamic considerations can also be used; the OligoArray software(Rouillard, Jean-Marie, Michael Zuker, and Erdogan Gulari. “OligoArray2.0: design of oligonucleotide probes for DNA microarrays using athermodynamic approach.” Nucleic acids research 31.12 (2003):3057-3062), for example, uses the Blast algorithm with a short word sizeto find all similarities. The resulting similarity matrix can be used tocompute the thermodynamic values (Tm, free energy, enthalpy and entropy;using MFOLD with thermodynamic parameters from SantaLucia (Zuker,Michael. “Mfold web server for nucleic acid folding and hybridizationprediction.” Nucleic acids research 31.13 (2003): 3406-3415; SantaLucia,John. “A unified view of polymer, dumbbell, and oligonucleotide DNAnearest-neighbor thermodynamics.” Proceedings of the National Academy ofSciences 95.4 (1998): 1460-1465)) of all possible hybridizations betweenthe target sequence and similar sequences. Potential sequences can thenbe eliminated using a thermodynamic threshold for cross-hybridization.The role of sub-regions of the primer can be considered in conferringprobe specificity via enzyme mismatch sensitivity profiles. For example,the 3′ ends of targeted RT primers may especially be sensitive tomismatches. See e.g., Ye, Jian, et al. “Primer-BLAST: a tool to designtarget-specific primers for polymerase chain reaction.” BMCbioinformatics 13.1 (2012): 1. For designing MIPs and padlock probes,the ligase can be sensitive to mismatches within approximately 6 baseson each side of the nick. See e.g., Mitra, Robi D., et al. “Fluorescentin situ sequencing on polymerase colonies.” Analytical biochemistry320.1 (2003): 55-65.

Furthermore, it may be reasonable to consider the sequence-dependency ofdownstream FISSEQ steps. For example, RNA secondary structure and thepresence of polysome complex or paused ribosomes may inhibit access tothe RNA by our probes. See e.g., Stahlberg, Anders, et al. “Propertiesof the reverse transcription reaction in mRNA quantification.” Clinicalchemistry 50.3 (2004): 509-515. In some instances, nucleosomes mayinhibit genomic access by DNA probes, as it does to Cas9. See e.g.,Horlbeck, Max A., et al. “Nucleosomes impede Cas9 access to DNA in vivoand in vitro.” Elife 5 (2016): e12677. The enzymes themselves, such asligases, polymerases, and the reverse transcriptase may have intrinsicbiases with respect to the sequence of the substrate. See e.g., Hafner,Markus, et al. “RNA-ligase-dependent biases in miRNA representation indeep-sequenced small RNA cDNA libraries.” Rna 17.9 (2011): 1697-1712.These sequence dependencies can be measured directly from experimentsusing probe-level barcodes.

Barcode Assignment

Two approaches to barcode assignment can be to randomly assign barcodesfrom a pool of k-mers, or to assign barcodes using an iterator functionto increment the barcodes. However, barcodes designed using thesestrategies may have limited capacity for error correction or errordetection, and may generate sequences that are sub-optimal forsynthesis. Instead, the assignment can be started with a large pool ofbarcodes, derived from the pool of k-mers and excluding homopolymer andGC runs ≥4, as well as G quadruplex. From this pool, a set of g barcodescan be identified with Hamming distance h using a graph based algorithm(Conway, Nicholas J. and Pruitt, Benjamin. “Libnano, a low-level pythonlibrary for DNA sequence file io, searching, and manipulation.”Unpublished GitHub Repository; Hagberg, Aric A., et al. “Exploringnetwork structure, dynamics, and function using NetworkX.” Proceedingsof the 7th Python in Science Conference (SciPy2008) 11-15, Pasadena,Calif. USA). The Hamming distance may provide for detection andcorrection of errors in sequenced barcodes, as it can require a certainnumber sequencing errors to cause one barcode to be detected as anotherbarcode. For example, using an iterator to increment barcodes mightassign “AA” to the first probe, “AT” to the second, “AG” to the third,and so on. Using this strategy, a sequencing error in the second base ofthis dinucleotide barcode would cause one barcode to be detected asanother valid barcode in the set. If the barcodes are separated byHamming distance, most sequencing errors generate invalid barcodesequences, which can simply be mapped to the nearest valid barcodesequence. More sophisticated error correction could also use heuristicsthat consider error bias and base call certainty. Those probes can bepaired with our constant adaptor features of the probe, such as the“T2S” (ACT TCA GCT GCC CCG GGT GAA GA) sequencing primer annealingregion, also requiring the combined sequence to satisfy homopolymerrestrictions and fall within homodimer and hairpin thermodynamicthresholds.

Probe Pool Selection

Having assembled the full probes, containing sequences complementary totarget RNA or DNA molecules, adaptor sequences, such as T2S, andbarcodes, a final screen can be performed to eliminate any probescontaining homopolymer runs, or that form homodimers or hairpins giventhermodynamic thresholds. For example, probes having homodimer orhairpin Tm>30° C. can be eliminated. Heterodimers or off-targetinteractions created during assembly of the full probe sequence can alsobe considered. In synthesizing probe libraries, we can be limited by thenumber of distinct oligonucleotide features on the microarray. When thishappens, the top n probes per gene or target locus can be taken usingthe scoring metrics described herein.

Subpool Amplification

Given the large number of oligonucleotide features per microarray,multiple probe libraries can be synthesized on a single chip. In orderto generate a single probe library, subpool amplification may be used tospecifically amplify a fraction of the probe population for maturationand use in FISSEQ. See e.g., Kosuri, Sriram, et al. “Scalable genesynthesis by selective amplification of DNA pools from high-fidelitymicrochips.” Nature biotechnology 28.12 (2010): 1295-1299. Even in thecase where an array contains only a single library, additional PCRprimer sequences can still be included, as the amount of raw materialproduced by the array may be on the scale of nanograms. To avoidinternal mispriming from payload sequences and cross-talk during subpoolamplification, primers may be automatically generated for thesereactions using heuristics derived from quantitative modeling andempirical data. This method can allow incorporation of sequence featuresinto the priming regions, such as Type IIS restriction sites, which maybe used to process the library into a mature probe pool.

Probe Maturation Strategies

After collecting the DNA library from the microarray chip, an initialglobal or subpool amplification can be performed. For use in FISSEQ, amicrograms or even milligrams of the mature single stranded probelibrary can be prepared, which does not include the amplification primersequences. There are many strategies for achieving this, for example, invitro transcription or PCR-based methods, which are described herein.Methods to process probes are also provided herein, such as forsynthesis of MIPs and padlock probes. These types of probes by designhave the variable sequences on the 5′ and 3′ ends of the probe. Type IISrestriction enzymes can be used to cut at a defined site, which isoutside of the enzyme recognition sequence. See e.g., Szybalski, Waclaw,et al. “Class-IIS restriction enzymes—a review.” Gene 100 (1991): 13-26.We find that the. The efficiency of cutting can be enhanced by using asplint oligonucleotide that extends over the restriction enzymerecognition sequence, to just beyond the start of the variable sequence,using inosines or universal bases to generate a duplex 1-3 bases pastthe cutting site.

IVT

In vitro transcription (IVT) may be enabled by including a T7 RNApolymerase promoter site in the probe library, which is used to linearlyamplify the entire probe pool into highly abundant single stranded RNAtranscripts. A targeted reverse transcription can then be performed toefficiently convert the RNA molecules into single stranded cDNA, afterwhich the RNA is degraded. See e.g., Chen, Kok Hao, et al. “Spatiallyresolved, highly multiplexed RNA profiling in single cells.” Science348.6233 (2015): aaa6090. The single-stranded cDNA may be furthermodified, as by splint restriction, wherein an oligonucleotide isannealed to the cDNA and the duplex DNA region is targeted by arestriction enzyme for digestion.

PCR

Another strategy may be to use PCR to exponentially amplify the library,followed by specific digestion of one of the duplex strands. One methodfor achieving this may be to include a 5′ phosphate on one primer, whichallows the resulting strand to be digested by lambda exonuclease. Seee.g., Beliveau, Brian J., et al. “Versatile design and synthesisplatform for visualizing genomes with Oligopaint FISH probes.”Proceedings of the National Academy of Sciences 109.52 (2012):21301-21306. Before or after exonuclease digestion, the probe can befurther processed using restriction enzymes.

Purification and Validation

The products may also be further purified, as by ethanol precipitation,beads, or columns to remove dNTPs or other reaction products and also todesalt the oligonucleotides. The probes can also be purified usingpolyacrylamide gel electrophoresis (PAGE) or High Performance LiquidChromatography (HPLC) to select only correctly-sized products. To ensurethat the final library has the correct sequences, next-generationsequencing can be used to measure the distribution of payload sizes,error rates during synthesis and amplification, and sequence diversity(FIGS. 2A-2C).

The present disclosure provides methods to identify a set of target RNAand DNA sequences. In some embodiments, one or more of DNA locus can beidentified. In some embodiments, one or more of DNA sequence, includingDNA sequence or structural variants can be identified. In someembodiments, one or more of RNA species can be identified. In someembodiments, one or more RNA sequence, including RNA editing, splicing,expressed sequence variation can be identified.

The present disclosure provides that reference sequence databases arecurated and mined to discover appropriate primer sequences for detectionin situ using the methods described above, including for sequences of RTprimers, PCR primers, MIP and padlock probes, Cas9 guide RNAs, and othertypes of targeted probes.

The disclosure provides that candidate sequences are scored usingmetrics intended to predict its specificity and efficiency in thecontext of both the initial synthesis and FISSEQ sample preparation.Exemplary metrics to be considered including sequence content, bothoverall, e.g. GC content, and local, e.g. G quadruplex, homopolymerruns; melting temperature and other thermodynamic properties such asfree energy, enthalpy and entropy; bias of proteins and enzymes used inthe probe complex, such as Cas9 and C2c2, or in downstream processing,such as the reverse transcriptase, DNA polymerase, DNA ligase, RNAligase, Circ-ligase; secondary structure and homodimer formation;limitations or optimizations of the microarray synthesis platform;target sequence features of the RNA or DNA, including secondarystructure and in situ protein occupancy (e.g., nucleosomes andribosomes); specificity of nucleic acid hybridization relative to othersequences known to be present; and/or sequence specificity of a subsetof the probe relevant to an enzymatic step, e.g., the 3′ end of an RTprimer, or the seed region of a microRNA.

The disclosure provides curation of a set of identifying barcodesequences, which can be detected by sequencing by synthesis (SBS),sequencing by ligation (SBL), or sequencing by hybridization (SBH), forthe identification of the target molecule. In some embodiments, thecuration process may consider features such as: sequence content, bothoverall, e.g. GC content, and local, e.g. G quadruplex, homopolymerruns; melting temperature and other thermodynamic properties such asfree energy, enthalpy and entropy; bias of proteins and enzymes used inthe probe complex, such as Cas9 and C2c2, or in downstream processing,such as the reverse transcriptase, DNA polymerase, DNA ligase, RNAligase, Circ-ligase, or enzymes used for sequencing; secondary structureand homodimer formation; limitations or optimizations of the microarraysynthesis platform; and error detection and error correction features,such as Hamming distance, as well as parity bits and codes constructedusing algorithms such as Golay.

The disclosure provides computational design of the DNA microarraysynthesis product, including the target sequences mined and scored asdescribed above. The present disclosure also provides methods to designadditional sequences including those for PCR amplification of thelibrary; sub-pool amplification of a subset of the total library; probepurification; sequences enabling probe expression and maturation, e.g.the T7 RNA polymerase promoter sequence, sites for restriction enzymes,etc.; sequences relevant to FISSEQ or in situ detection, e.g.,sequencing adaptors, HCR or CHCR initiators or adaptor sequences, sitesof primary, secondary, or additional probing in situ; and barcodingsequences used for identification of the probe and cognate targetmolecule.

The disclosure provides curation of a library from the candidatesequences, and/or modified candidate sequences (e.g., after addingadaptors and other features necessary for subsequent amplification andprocessing, FISSEQ, identifying barcodes, etc.). In some embodiments,the methods comprise the steps of discovering a mutually compatible setof candidate sequences. In some embodiments, the methods furthercomprise considering heterodimerization and off-priming. In someembodiments, the methods further comprise considering sequences formedduring assembly of the full sequence, such as at the junction ofsequence features, e.g. the junction between a subpool amplificationprimer and the segment of the primer responsible for binding a targetnucleic acid. In some embodiments, the methods comprise consideringsequence content, both overall, e.g. GC content, and local, e.g. Gquadruplex, homopolymer runs; melting temperature and otherthermodynamic properties such as free energy, enthalpy and entropy;secondary structure and homodimer formation; and limitations oroptimizations of the microarray synthesis platform.

The disclosure provides methods of amplifying and maturing a probelibrary for targeted RNA or DNA detection in situ from a DNA microarrayor DNA oligonucleotides liberated from a DNA microarray. In someembodiments, the methods comprise PCR. In some embodiments, the methodscomprise enzymatic processing, such as cleavage by restriction enzymesto remove PCR and other adaptor sequences which are irrelevant ordeleterious to use for in situ RNA or DNA detection. In someembodiments, the methods comprise inclusion of chemical modifications,modified bases, or nucleic acid analogs during probe amplificationand/or maturation. The modifications include, but are not limited to,chemical handles for cross-linking, including primary amines,biotin/streptavidin, thiol; locked nucleic acid (LNA) bases, which areknown to improve hybridization kinetics and specificity; and 2′-O-methylRNA bases and phosphorothioate linkages, which are known to makeoligonucleotides resistant to certain nuclease treatments. In someembodiments, the disclosure provides methods of generating asingle-stranded final product, such as by lambda exonuclease digestionof a 5′ phosphate-bearing complementary strand or expression of RNA,such as by IVT, followed by RT and degradation of the RNA to form asingle stranded product. In some embodiments, the present disclosureprovides methods for purification, including by adding handles forpurification, such as biotin, or by PAGE, HPLC, using beads, or othermethods known to a skilled artisan of cleaning up and purifyingoligonucleotides; and

In some embodiments, the present disclosure provides methods of usingnext generation sequencing (NGS) to validate the product library,including determination of variation in the presence and abundance ofindividual probes; discovery of the relationship between identifyingbarcodes and the sequences relevant to molecular targeting, in the casewhere barcodes are synthesized randomly

The disclosure provides methods for in situ targeted detection of RNAand/or DNA species, using probes synthesized by DNA microarray.

Terms and symbols of nucleic acid chemistry, biochemistry, genetics, andmolecular biology used herein may follow those of standard treatises andtexts in the field, e.g., Komberg and Baker, DNA Replication, SecondEdition (W.H. Freeman, New York, 1992); Lehninger, Biochemistry, SecondEdition (Worth Publishers, New York, 1975); Strachan and Read, HumanMolecular Genetics, Second Edition (Wiley-Liss, New York, 1999);Eckstein, editor, Oligonucleotides and Analogs: A Practical Approach(Oxford University Press, New York, 1991); Gait, editor, OligonucleotideSynthesis: A Practical Approach (IRL Press, Oxford, 1984); and the like.

Computer Control Systems

The present disclosure provides computer control systems that areprogrammed to implement methods of the disclosure. FIG. 7 shows acomputer system 701 that is programmed or otherwise configured to aid ingeneration of said libraries of probes, or sequencing nucleic acids ofinterest, as described here. The computer system 701 can regulatevarious aspects of the present disclosure, such as, for example,determination of target sequences of interest, and/or scoring of saidprobes. In some aspects, the computer system may be programmed tocontrol release of reagents, activation of reactions (e.g.,amplification reactions), and/or may initiate a sequencing reaction totake place. The computer system 701 can be an electronic device of auser or a computer system that is remotely located with respect to theelectronic device. The electronic device can be a mobile electronicdevice.

The computer system 701 includes a central processing unit (CPU, also“processor” and “computer processor” herein) 705, which can be a singlecore or multi core processor, or a plurality of processors for parallelprocessing. The computer system 701 also includes memory or memorylocation 710 (e.g., random-access memory, read-only memory, flashmemory), electronic storage unit 715 (e.g., hard disk), communicationinterface 720 (e.g., network adapter) for communicating with one or moreother systems, and peripheral devices 725, such as cache, other memory,data storage and/or electronic display adapters. The memory 710, storageunit 715, interface 720 and peripheral devices 725 are in communicationwith the CPU 705 through a communication bus (solid lines), such as amotherboard. The storage unit 715 can be a data storage unit (or datarepository) for storing data. The computer system 701 can be operativelycoupled to a computer network (“network”) 730 with the aid of thecommunication interface 720. The network 730 can be the Internet, aninternet and/or extranet, or an intranet and/or extranet that is incommunication with the Internet. The network 730 in some cases is atelecommunication and/or data network. The network 730 can include oneor more computer servers, which can enable distributed computing, suchas cloud computing. The network 730, in some cases with the aid of thecomputer system 701, can implement a peer-to-peer network, which mayenable devices coupled to the computer system 701 to behave as a clientor a server.

The CPU 705 can execute a sequence of machine-readable instructions,which can be embodied in a program or software. The instructions may bestored in a memory location, such as the memory 710. The instructionscan be directed to the CPU 705, which can subsequently program orotherwise configure the CPU 705 to implement methods of the presentdisclosure. Examples of operations performed by the CPU 705 can includefetch, decode, execute, and writeback.

The CPU 705 can be part of a circuit, such as an integrated circuit. Oneor more other components of the system 701 can be included in thecircuit. In some cases, the circuit is an application specificintegrated circuit (ASIC).

The storage unit 715 can store files, such as drivers, libraries andsaved programs. The storage unit 715 can store user data, e.g., userpreferences and user programs. The computer system 701 in some cases caninclude one or more additional data storage units that are external tothe computer system 701, such as located on a remote server that is incommunication with the computer system 701 through an intranet or theInternet.

The computer system 701 can communicate with one or more remote computersystems through the network 730. For instance, the computer system 701can communicate with a remote computer system of a user (e.g., a usergenerating said probes of the current disclosure or a user utilizingsuch probes). Examples of remote computer systems include personalcomputers (e.g., portable PC), slate or tablet PC's (e.g., Apple® iPad,Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone,Android-enabled device, Blackberry®), or personal digital assistants.The user can access the computer system 701 via the network 730.

Methods as described herein can be implemented by way of machine (e.g.,computer processor) executable code stored on an electronic storagelocation of the computer system 701, such as, for example, on the memory710 or electronic storage unit 715. The machine executable or machinereadable code can be provided in the form of software. During use, thecode can be executed by the processor 705. In some cases, the code canbe retrieved from the storage unit 715 and stored on the memory 710 forready access by the processor 705. In some situations, the electronicstorage unit 715 can be precluded, and machine-executable instructionsare stored on memory 710.

The code can be pre-compiled and configured for use with a machinehaving a processer adapted to execute the code, or can be compiledduring runtime. The code can be supplied in a programming language thatcan be selected to enable the code to execute in a pre-compiled oras-compiled fashion.

Aspects of the systems and methods provided herein, such as the computersystem 701, can be embodied in programming. Various aspects of thetechnology may be thought of as “products” or “articles of manufacture”typically in the form of machine (or processor) executable code and/orassociated data that is carried on or embodied in a type of machinereadable medium. Machine-executable code can be stored on an electronicstorage unit, such as memory (e.g., read-only memory, random-accessmemory, flash memory) or a hard disk. “Storage” type media can includeany or all of the tangible memory of the computers, processors or thelike, or associated modules thereof, such as various semiconductormemories, tape drives, disk drives and the like, which may providenon-transitory storage at any time for the software programming. All orportions of the software may at times be communicated through theInternet or various other telecommunication networks. Suchcommunications, for example, may enable loading of the software from onecomputer or processor into another, for example, from a managementserver or host computer into the computer platform of an applicationserver. Thus, another type of media that may bear the software elementsincludes optical, electrical and electromagnetic waves, such as usedacross physical interfaces between local devices, through wired andoptical landline networks and over various air-links. The physicalelements that carry such waves, such as wired or wireless links, opticallinks or the like, also may be considered as media bearing the software.As used herein, unless restricted to non-transitory, tangible “storage”media, terms such as computer or machine “readable medium” refer to anymedium that participates in providing instructions to a processor forexecution.

Hence, a machine readable medium, such as computer-executable code, maytake many forms, including but not limited to, a tangible storagemedium, a carrier wave medium or physical transmission medium.Non-volatile storage media include, for example, optical or magneticdisks, such as any of the storage devices in any computer(s) or thelike, such as may be used to implement the databases, etc. shown in thedrawings. Volatile storage media include dynamic memory, such as mainmemory of such a computer platform. Tangible transmission media includecoaxial cables; copper wire and fiber optics, including the wires thatcomprise a bus within a computer system. Carrier-wave transmission mediamay take the form of electric or electromagnetic signals, or acoustic orlight waves such as those generated during radio frequency (RF) andinfrared (IR) data communications. Common forms of computer-readablemedia therefore include for example: a floppy disk, a flexible disk,hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD orDVD-ROM, any other optical medium, punch cards paper tape, any otherphysical storage medium with patterns of holes, a RAM, a ROM, a PROM andEPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wavetransporting data or instructions, cables or links transporting such acarrier wave, or any other medium from which a computer may readprogramming code and/or data. Many of these forms of computer readablemedia may be involved in carrying one or more sequences of one or moreinstructions to a processor for execution.

The computer system 701 can include or be in communication with anelectronic display 735 that comprises a user interface (UI) 740 forproviding, for example, scoring of said probes, or showing detectionand/or sequencing of biomolecules of interest using said libraries ofprobes. Examples of UI's include, without limitation, a graphical userinterface (GUI) and web-based user interface. In some instances, thecomputer system may be configured to be in communication with variousother devices and may be programmed to control such devices. Forexample, the computer system may be in communication with various lightsources (e.g., fluorescent light sources) and/or platforms for utilizingsaid probe libraries or platforms utilized for sequencing.

Methods and systems of the present disclosure can be implemented by wayof one or more algorithms. An algorithm can be implemented by way ofsoftware upon execution by the central processing unit 705. Thealgorithm can, for example, be executed so as to generate said probes orlibraries of probes of the current disclosure. The algorithms maycomprise relevant parameters for designing and/or generating saidprobes. In some instances, the algorithms may comprise relevantparameters to implement detection of biomolecules of interest.

The terminology used herein is for the purpose of describing particularembodiments only and is not intended to be limiting of the invention. Asused herein, the singular forms “a”, “an” and “the” are intended toinclude the plural forms as well, unless the context clearly indicatesotherwise. It will be further understood that the terms “comprises”and/or “comprising,” or “includes” and/or “including,” when used in thisspecification, specify the presence of stated features, regions,integers, steps, operations, elements and/or components, but do notpreclude the presence or addition of one or more other features,regions, integers, steps, operations, elements, components and/or groupsthereof.

Furthermore, relative terms, such as “lower” or “bottom” and “upper” or“top” may be used herein to describe one element's relationship to otherelements as illustrated in the figures. It will be understood thatrelative terms are intended to encompass different orientations of theelements in addition to the orientation depicted in the figures. Forexample, if the element in one of the figures is turned over, elementsdescribed as being on the “lower” side of other elements would then beoriented on the “upper” side of the other elements. The exemplary term“lower” can, therefore, encompass both an orientation of “lower” and“upper,” depending upon the particular orientation of the figure.Similarly, if the element in one of the figures were turned over,elements described as “below” or “beneath” other elements would then beoriented “above” the other elements. The exemplary terms “below” or“beneath” can, therefore, encompass both an orientation of above andbelow.

While preferred embodiments of the present invention have been shown anddescribed herein, it will be obvious to those skilled in the art thatsuch embodiments are provided by way of example only. Numerousvariations, changes, and substitutions will now occur to those skilledin the art without departing from the invention. It should be understoodthat various alternatives to the embodiments of the invention describedherein may be employed in practicing the invention. Numerous differentcombinations of embodiments described herein are possible, and suchcombinations are considered part of the present disclosure. In addition,all features discussed in connection with any one embodiment herein canbe readily adapted for use in other embodiments herein. It is intendedthat the following claims define the scope of the invention and thatmethods and structures within the scope of these claims and theirequivalents be covered thereby.

EXAMPLES Example 1—Exemplary Probe Designs

Exemplary linear probe design is shown in FIGS. 1A-C. FIG. 1A depictsthat a mature primer includes: (a) sequence complementary to the RNAmolecule at the 3′ end, which anneals to the RNA molecule in situ andprimes RT; (b) a common adaptor sequence, from which RCA and sequencingreactions are primed; (c) a gene-level barcode at the 5′ end; and 5′phosphorylation. FIG. 1B depicts that the complementary region of theprimer anneals to the target RNA species and primes an RT reaction,incorporating RNA-templated bases into the cDNA. FIG. 1C depicts that inthe linear RCA amplicon each of the n tandem repeats contains thebarcode as well as adjacent RNA-templated sequence, enablingquantification of capture specificity.

Additional probe design and maturation strategies are shown in FIGS.4A-E. FIG. 4A-E depicts an exemplary probe design and maturationstrategy for manufacture of padlock or gap-fill probes byoligonucleotide library synthesis, such as by DNA microarray. FIG. 4Ashows a schematic of the probe design featuring conserved sequences onthe ends, which may also be used for amplification of the nucleic acidmaterial, such as by PCR or IVT. Alternatively, additional sequences 3′and 5′ of the red domains may be included (not shown) for the purpose ofamplification or sub-pool amplification. A barcode domain is includedfor the purpose of molecular identification. A central sequence issubstantially complementary to the target sequence for the purpose ofdirecting the probe to the target molecule via a nucleic acidhybridization reaction. FIG. 4B shows: 1) After amplification of theprobe pool to sufficient quantity, a splint ligation reactioncircularizes the probe. FIG. 4C shows: 2) Second-strand synthesis, suchas by a non-displacing DNA polymerase, generates a second complementarystrand featuring a nick at the 5′ end of the second-strand synthesisprimer. FIG. 4D shows: 3) A type IIS restriction enzyme is used tocreate a double-strand break within the targeting domain (annealingsequence). FIG. 4E shows: 4) The mature probe is isolated, such as byelectrophoretic gel purification technique. This process may be referredto as probe maturation, encompassing the steps of amplification andprocessing required for converting an as-synthesized nucleic acid probeinto a form suitable for use in assay.

Example 2—Validation of Microarray-Synthesized FISSEQ Primer Library

FIG. 2A illustrates an exemplary results of validation ofmicroarray-synthesized FISSEQ primer library. FIG. 2A shows that thepayload length of this library is distributed around 0, suggesting thatthe synthesis and/or amplification of the library filed. The targetpayload size is indicated by the dotted line. FIG. 2B shows that asignificant fraction of the pool contains a payload that matches theexpected size, which is indicated by the dotted line. Note there is alsoa significant fraction of payloads within a few bases in length, whichreflects the main error mode of array synthesis—deletions. FIG. 2C showsviolin plot of the distribution of payloads from FIG. 2B shows that mostmembers of the library are present at an average of a few copies. Somemembers are missing from the final library, and also some members arepresent at ˜10× the average level of the library.

Example 3—Targeted FISSEQ Capture Schemes

FIGS. 3A-F depict exemplary targeted FISSEQ capture schemes. As shown inFIG. 3, the FISSEQ capture probe can be used to capture target nucleicacid molecule region, such as a RNA, mRNA, DNA, or genomic locus. Thecapture probe comprises the sequence domain substantially complementaryto the target molecule sequence and a non complementary tail regioncontaining adaptor sequences, sequencing primer domains, and barcodesequence domains. FIG. 3A shows a targeted polymerization reaction isdirected to the target molecule via the targeting sequence domain, whichserves to prime a nucleic acid polymerization reaction, such as reversetranscription of RNA into cDNA, or second-strand synthesis of a DNAsequence by a DNA polymerase. The primer is extended by the polymerase,incorporating endogenous sequence, and linked into the FISSEQ 3Dhydrogel matrix to preserve the spatial localization of the molecule.Finally, the template is circularized and amplified, such as by rollingcircle amplification, for detection via sequencing. FIG. 3B shows apre-circularized probe is hybridized against a target molecule andsubsequently amplified, such as by rolling circle amplification, fordetection via sequencing. FIG. 3C shows a protein and nucleic acidcapture probe complex is used to mediate selection of the targetmolecule, for the purpose of directing FISSEQ library constructionbiochemistry to the target molecule. Library construction biochemistrymay include, but is not limited to, cutting, ligating, and other nucleicacid reactions for the purpose of sequencing, or association of acognate barcode or otherwise detectable label. FIG. 3D shows a “padlockprobe” is designed such that the ends of the probe come into immediatecontact when hybridized to the target molecule. A ligase reaction formsthe phosphodiester bond circularizing the probe. Finally, the templateis circularized and amplified, such as by rolling circle amplification,for detection via sequencing. FIG. 3E shows a probe is hybridized to thetarget molecule via the complementarity domain, with a subsequentnon-target-molecule-dependent ligation reaction serving to circularizethe probe, followed by amplification and sequencing. FIG. 3F shows a“gap fill” probe (also commonly referred to as a molecular inversionprobe, or MIP) is used for targeted FISSEQ. The hybridization arms ofthe probe anneal to the target molecule forming a gap between the 3′ and5′ ends. Subsequently, a nucleic acid polymerization reaction primed bythe 3′ end of the probe serves to extend the probe incorporatingendogenous sequence-templated bases. Subsequently, a ligase forms aphosphodiester bond circularizing the probe. Finally, the template iscircularized and amplified, such as by rolling circle amplification, fordetection via sequencing.

Example 4—Targeted FISSEG of OncoType Dx Panel

OncoType Dx gene panel was used as an example to demonstrate theworkflow of targeted FISSEQ. FIG. 5 depicts an exemplary image oftargeted FISSEQ of human breast cancer tissue biopsy sample. A pool oftargeted reverse transcription primers initiate reverse transcriptionreaction at genes of the clinically-relevant OncoType Dx gene panel. Theimage depicts a single base of sequencing reaction, rotated in 3D todemonstrate molecular identification within the 3D FISSEQ hydrogel andoriginal tissue sample. Gene expression profiles of the OncoType Dxgenes are computed. FIG. 6 shows exemplary experimental data summary ofsequencing from FIG. 5. Upper text includes relevant experimental data.Lower left panel shows the location of molecular identification eventssuperimposed over the tissue image. Lower middle panel shows the samemolecules indicating a sequencing quality metric. Upper middle graphsshow the distribution of sequencing signals for each base over thebarcode sequence, demonstrating high quality sequencing data. Righttable shows the genes included in the assay and associated sequencebarcodes.

1.-102. (canceled)
 103. A method for in situ nucleic acid sequencedetection or identification of one or more target nucleic acid moleculesof a cell, comprising: (a) providing said cell comprising a reactionmixture comprising said one or more target nucleic acid molecules and aplurality of probes, wherein said plurality of probes comprises aplurality of target-specific sequences, a plurality of adaptor sequencesand a plurality of barcode sequences, wherein a probe of said pluralityof probes comprises: (i) a target-specific sequence of said plurality oftarget-specific sequences that is complementary to a target sequence ofa target nucleic acid molecule of said one or more target nucleic acidmolecules; (ii) an adaptor sequence of said plurality of adaptorsequences coupled to said target-specific sequence, wherein said adaptorsequence is for conducting an amplification reaction on said probe whensaid target-specific sequence is hybridized to said target sequence; and(iii) a barcode sequence of said plurality of barcode sequences coupledto said adaptor sequence, wherein said barcode sequence is configured toallow detection or identification of said target sequence or at least aportion of said target nucleic acid molecule, and wherein said pluralityof barcode sequences are different across said plurality of probes; (b)within said cell, subjecting said reaction mixture to conditionssufficient to permit said target-specific sequence to hybridize to saidtarget sequence; and (c) using said barcode sequence to detect oridentify said target sequence or said at least said portion of saidtarget nucleic acid molecule.
 104. The method of claim 103, wherein saidcell is fixed.
 105. The method of claim 103, wherein said cell isintegrated with a hydrogel.
 106. The method of claim 103, wherein saidtarget nucleic acid molecule is a ribonucleic acid molecule, and wherein(b) further comprises subjecting said sequence to conditions sufficientto perform reverse transcription on said sequence to yield acomplementary deoxyribonucleic acid molecule.
 107. The method of claim103, wherein said plurality of barcode sequences permits identificationof different target sequences of different target nucleic acid moleculesof said one or more target nucleic acid molecules.
 108. The method ofclaim 103, wherein said plurality of adaptor sequences compriseidentical sequences across said plurality of probes.
 109. The method ofclaim 103, wherein said adaptor sequence is complementary to a primerfor conducting said amplification reaction.
 110. The method of claim109, further comprising, prior to (c), binding said primer to saidadaptor sequence.
 111. The method of claim 110, further comprising,prior to (c), conducting said amplification reaction on said probe whensaid target-specific sequence is hybridized to said target sequence.112. The method of claim 103, wherein said probe is a circular probe.113. The method of claim 103, wherein said target-specific sequence,said adaptor sequence, and said barcode sequence are arrangedcontiguously from 3′ end to 5′ end of said probe.
 114. The method ofclaim 103, further comprising circularizing said probe.
 115. The methodof claim 114, wherein said circularizing comprises ligating a 3′ end ofsaid probe to a 5′ end of said probe when said target-specific sequenceof said probe is hybridized to said target sequence.
 116. The method ofclaim 115, wherein said circularizing comprises ligating said 3′ end tosaid 5′ end using a ligase and a splint oligonucleotide independent ofsaid target nucleic acid molecule.
 117. The method of claim 115, whereinsaid circularizing comprises extending said 3′ end of said probe withaid of a reverse transcriptase or a polymerase to yield an extendedproduct, and ligating ends of said extended product together.
 118. Themethod of claim 103, wherein said reaction mixture further comprises aplurality of nucleic acid-binding proteins, which plurality of nucleicacid-binding proteins mediate binding of said plurality of probes ontosaid one or more target nucleic acid molecules.
 119. The method of claim103, wherein said reaction mixture further comprises a hybridizationreaction enhancing agent that enhances a rate of a hybridizationreaction between said target nucleic acid molecule and said probe. 120.The method of claim 119, further comprising, subsequent to (b), removingsaid hybridization reaction enhancing agent from said cell.
 121. Amethod for enhancing a hybridization reaction in a cell or cellularmatrix, comprising: (a) providing said cell or cellular matrix and areaction mixture, comprising (i) a target nucleic acid molecule, (ii) aprobe having sequence complementarity with a target sequence of saidtarget nucleic acid molecule, and (iii) a hybridization reactionenhancing agent comprising a polymer backbone, wherein saidhybridization reaction enhancing agent enhances a rate of ahybridization reaction between said target nucleic acid molecule andsaid probe having sequence complementarity with said target sequence ofsaid target molecule; (b) subjecting said reaction mixture to conditionssufficient to conduct said hybridization reaction between said targetnucleic acid molecule and said probe having sequence complementaritywith said target sequence of said target nucleic acid molecule, whereinduring said hybridization reaction, said hybridization reactionenhancing agent facilitates said hybridization reaction between saidtarget nucleic acid molecule and said probe having sequencecomplementarity with said target sequence of said target molecule; and(c) removing said hybridization reaction enhancing agent from said cellor cellular matrix.
 122. The method of claim 121, wherein said removingin (c) comprises washing said hybridization reaction enhancing agentaway from said cell or cellular matrix.